Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

A Review on Human–AI Interaction in Machine Learning and Insights for Medical Applications

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18042121 ◽

2021 ◽

Vol 18 (4) ◽

pp. 2121

Author(s):

Mansoureh Maadi ◽

Hadi Akbarzadeh Khorshidi ◽

Uwe Aickelin

Keyword(s):

Machine Learning ◽

Future Research ◽

Computational Power ◽

Medical Field ◽

Interactive Machine Learning ◽

Human In The Loop ◽

Human Interactions ◽

Scoping Literature Review ◽

Domain Expertise ◽

Expertise Level

Objective: To provide a human–Artificial Intelligence (AI) interaction review for Machine Learning (ML) applications to inform how to best combine both human domain expertise and computational power of ML methods. The review focuses on the medical field, as the medical ML application literature highlights a special necessity of medical experts collaborating with ML approaches. Methods: A scoping literature review is performed on Scopus and Google Scholar using the terms “human in the loop”, “human in the loop machine learning”, and “interactive machine learning”. Peer-reviewed papers published from 2015 to 2020 are included in our review. Results: We design four questions to investigate and describe human–AI interaction in ML applications. These questions are “Why should humans be in the loop?”, “Where does human–AI interaction occur in the ML processes?”, “Who are the humans in the loop?”, and “How do humans interact with ML in Human-In-the-Loop ML (HILML)?”. To answer the first question, we describe three main reasons regarding the importance of human involvement in ML applications. To address the second question, human–AI interaction is investigated in three main algorithmic stages: 1. data producing and pre-processing; 2. ML modelling; and 3. ML evaluation and refinement. The importance of the expertise level of the humans in human–AI interaction is described to answer the third question. The number of human interactions in HILML is grouped into three categories to address the fourth question. We conclude the paper by offering a discussion on open opportunities for future research in HILML.

Download Full-text

Machine learning powered ellipsometry

Light Science & Applications ◽

10.1038/s41377-021-00482-0 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Jinchao Liu ◽

Di Zhang ◽

Dianqiang Yu ◽

Mengxin Ren ◽

Jingjun Xu

Keyword(s):

Machine Learning ◽

Optical Constants ◽

Optical Characterization ◽

Superior Performance ◽

Trial And Error ◽

Powerful Method ◽

Human In The Loop ◽

Ill Posed ◽

Fully Automatic

AbstractEllipsometry is a powerful method for determining both the optical constants and thickness of thin films. For decades, solutions to ill-posed inverse ellipsometric problems require substantial human–expert intervention and have become essentially human-in-the-loop trial-and-error processes that are not only tedious and time-consuming but also limit the applicability of ellipsometry. Here, we demonstrate a machine learning based approach for solving ellipsometric problems in an unambiguous and fully automatic manner while showing superior performance. The proposed approach is experimentally validated by using a broad range of films covering categories of metals, semiconductors, and dielectrics. This method is compatible with existing ellipsometers and paves the way for realizing the automatic, rapid, high-throughput optical characterization of films.

Download Full-text

Machine Learning Models for Stock Prediction Using Real-Time Streaming Data

Learning and Analytics in Intelligent Systems - Biologically Inspired Techniques in Many-Criteria Decision Making ◽

10.1007/978-3-030-39033-4_10 ◽

2020 ◽

pp. 101-108

Author(s):

Monalisa Jena ◽

Ranjan Kumar Behera ◽

Santanu Kumar Rath

Keyword(s):

Machine Learning ◽

Real Time ◽

Streaming Data ◽

Learning Models ◽

Stock Prediction ◽

Machine Learning Models

Download Full-text

Significant Impact of Improved Machine Learning Algorithm in The Processes of Large Data Sets

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206133 ◽

2020 ◽

pp. 458-467

Author(s):

Virendra Tiwari ◽

Balendra Garg ◽

Uday Prakash Sharma

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Dynamic Environment ◽

Large Data ◽

Machine Learning Algorithms ◽

Streaming Data ◽

Machine Learning Techniques ◽

Machine Learning Algorithm ◽

Learning Mechanisms

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.

Download Full-text

Human-in-the-loop applied machine learning

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8257900 ◽

2017 ◽

Cited By ~ 1

Author(s):

Carla E. Brodley

Keyword(s):

Machine Learning ◽

Human In The Loop ◽

Applied Machine Learning

Download Full-text

“Just the Way You Are”: Linking Music Listening on Spotify and Personality

Social Psychological and Personality Science ◽

10.1177/1948550620923228 ◽

2020 ◽

pp. 194855062092322

Author(s):

Ian Anderson ◽

Santiago Gil ◽

Clay Gibson ◽

Scott Wolf ◽

Will Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Personality Traits ◽

Meta Analysis ◽

Streaming Data ◽

Self Report ◽

Big Five Personality ◽

Music Listening ◽

Listening Behavior ◽

Listening Behaviors ◽

Musical Preferences

Advances in digital technology have put music libraries at people’s fingertips, giving them immediate access to more music than ever before. Here we overcome limitations of prior research by leveraging ecologically valid streaming data: 17.6 million songs and over 662,000 hr of music listened to by 5,808 Spotify users spanning a 3-month period. Building on interactionist theories, we investigated the link between personality traits and music listening behavior, described by an extensive set of 211 mood, genre, demographic, and behavioral metrics. Findings from machine learning showed that the Big Five personality traits are predicted by musical preferences and habitual listening behaviors with moderate to high accuracy. Importantly, our work contrasts a recent self-report-based meta-analysis, which suggested that personality traits play only a small role in musical preferences; rather, we show with big data and advanced machine learning methods that personality is indeed important and warrants continued rigorous investigation.

Download Full-text