scholarly journals eXplainable Cooperative Machine Learning with NOVA

2020 ◽  
Vol 34 (2) ◽  
pp. 143-164 ◽  
Author(s):  
Tobias Baur ◽  
Alexander Heimerl ◽  
Florian Lingenfelser ◽  
Johannes Wagner ◽  
Michel F. Valstar ◽  
...  

Abstract In the following article, we introduce a novel workflow, which we subsume under the term “explainable cooperative machine learning” and show its practical application in a data annotation and model training tool called NOVA. The main idea of our approach is to interactively incorporate the ‘human in the loop’ when training classification models from annotated data. In particular, NOVA offers a collaborative annotation backend where multiple annotators join their workforce. A main aspect is the possibility of applying semi-supervised active learning techniques already during the annotation process by giving the possibility to pre-label data automatically, resulting in a drastic acceleration of the annotation process. Furthermore, the user-interface implements recent eXplainable AI techniques to provide users with both, a confidence value of the automatically predicted annotations, as well as visual explanation. We show in an use-case evaluation that our workflow is able to speed up the annotation process, and further argue that by providing additional visual explanations annotators get to understand the decision making process as well as the trustworthiness of their trained machine learning models.

2020 ◽  
Author(s):  
Mikołaj Morzy ◽  
Bartłomiej Balcerzak ◽  
Adam Wierzbicki ◽  
Adam Wierzbicki

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.


2021 ◽  
Author(s):  
Junhua Huang ◽  
Bohan Zhu ◽  
Hongxi Zhou ◽  
Qiwei Zheng ◽  
Zhuo Chen ◽  
...  

With the continuous expansion of the scale of optical communication network and the rapid increase of network traffic demand, the management form of multi-domain optical network has widely existed. OSNR is an important indicator to judge the quality of communication. It is very important to predict OSNR more accurately in a low-cost and energy-saving way in multi-domain optical networks. In this paper, a scheme of federal learning in multi-domain optical networks is proposed to improve the accuracy of the OSNR prediction. The main idea is to train hybrid machine learning model in each single domain, then the strategy of federal learning is used for optimization it in multi-domains. The performance of the proposed scheme is verified by simulation experiments. The strategy can alleviate the problems of data silos and model training set caused by multi-domain optical network. According to simulation result, when the amount of data reaches 5×103, adding this strategy will reduce the mean square error of the prediction model by about 18%. It can improve the performance of machine learning model, the ability of OSNR prediction and the reliability of network operation.


2021 ◽  
Vol 7 (1) ◽  
pp. 39
Author(s):  
José Bobes-Bascarán ◽  
Eduardo Mosqueira-Rey ◽  
David Alonso-Ríos

At present, the great majority of Artificial Intelligence (AI) systems require the participation of humans in their development, tuning, and maintenance. Particularly, Machine Learning (ML) systems could greatly benefit from their expertise or knowledge. Thus, there is an increasing interest around how humans interact with those systems to obtain the best performance for both the AI system and the humans involved. Several approaches have been studied and proposed in the literature that can be gathered under the umbrella term of Human-in-the-Loop Machine Learning. The application of those techniques to the health informatics environment could provide a great value on prognosis and diagnosis tasks contributing to develop a better health service for Cancer related diseases.


2021 ◽  
Vol 22 (9) ◽  
pp. 4435
Author(s):  
Talia B. Kimber ◽  
Yonghui Chen ◽  
Andrea Volkamer

Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.


2021 ◽  
Author(s):  
Lazaros Toumanidis ◽  
Panagiotis Kasnesis ◽  
Christos Chatzigeorgiou ◽  
Michail Feidakis ◽  
Charalampos Patrikakis

A widespread practice in machine learning solutions is the continuous use of human intelligence to increase their quality and efficiency. A common problem in such solutions is the requirement of a large amount of labeled data. In this paper, we present a practical implementation of the human-in-the-loop computing practice, which includes the combination of active and transfer learning for sophisticated data sampling and weight initialization respectively, and a cross-platform mobile application for crowdsourcing data annotation tasks. We study the use of the proposed framework to a post-event building reconnaissance scenario, where we utilized the implementation of an existing pre-trained computer vision model, an image binary classification solution built on top of it, and max entropy and random sampling as uncertainty sampling methods for the active learning step. Multiple annotations with majority voting as quality assurance are required for new human-annotated images to be added on the train set and retrain the model. We provide the results and discuss our next steps.


2020 ◽  
Author(s):  
Aleksandra Nabożny ◽  
Bartłomiej Balcerzak ◽  
Adam Wierzbicki ◽  
Mikołaj Morzy

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.


2006 ◽  
Author(s):  
Christopher Schreiner ◽  
Kari Torkkola ◽  
Mike Gardner ◽  
Keshu Zhang

2019 ◽  
Vol 20 (5) ◽  
pp. 540-550 ◽  
Author(s):  
Jiu-Xin Tan ◽  
Hao Lv ◽  
Fang Wang ◽  
Fu-Ying Dao ◽  
Wei Chen ◽  
...  

Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.


Author(s):  
Mansoureh Maadi ◽  
Hadi Akbarzadeh Khorshidi ◽  
Uwe Aickelin

Objective: To provide a human–Artificial Intelligence (AI) interaction review for Machine Learning (ML) applications to inform how to best combine both human domain expertise and computational power of ML methods. The review focuses on the medical field, as the medical ML application literature highlights a special necessity of medical experts collaborating with ML approaches. Methods: A scoping literature review is performed on Scopus and Google Scholar using the terms “human in the loop”, “human in the loop machine learning”, and “interactive machine learning”. Peer-reviewed papers published from 2015 to 2020 are included in our review. Results: We design four questions to investigate and describe human–AI interaction in ML applications. These questions are “Why should humans be in the loop?”, “Where does human–AI interaction occur in the ML processes?”, “Who are the humans in the loop?”, and “How do humans interact with ML in Human-In-the-Loop ML (HILML)?”. To answer the first question, we describe three main reasons regarding the importance of human involvement in ML applications. To address the second question, human–AI interaction is investigated in three main algorithmic stages: 1. data producing and pre-processing; 2. ML modelling; and 3. ML evaluation and refinement. The importance of the expertise level of the humans in human–AI interaction is described to answer the third question. The number of human interactions in HILML is grouped into three categories to address the fourth question. We conclude the paper by offering a discussion on open opportunities for future research in HILML.


2021 ◽  
Vol 13 (4) ◽  
pp. 94
Author(s):  
Haokun Fang ◽  
Quan Qian

Privacy protection has been an important concern with the great success of machine learning. In this paper, it proposes a multi-party privacy preserving machine learning framework, named PFMLP, based on partially homomorphic encryption and federated learning. The core idea is all learning parties just transmitting the encrypted gradients by homomorphic encryption. From experiments, the model trained by PFMLP has almost the same accuracy, and the deviation is less than 1%. Considering the computational overhead of homomorphic encryption, we use an improved Paillier algorithm which can speed up the training by 25–28%. Moreover, comparisons on encryption key length, the learning network structure, number of learning clients, etc. are also discussed in detail in the paper.


Sign in / Sign up

Export Citation Format

Share Document