PRM94 - ALIGNING TEXT MINING AND MACHINE LEARNING ALGORITHMS WITH BEST PRACTICES FOR STUDY SELECTION IN SYSTEMATIC LITERATURE REVIEWS

Abstract Background Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion—to align with current practices—and to determine optimal parameter settings for feature-set generation and ML algorithms. Methods We used abstract and full-text selection data from five large SLRs (n = 3089 to 12,769 abstracts) across a variety of disease areas. Each SLR was split into training and test sets. We developed a multi-step algorithm to categorize each citation into the following categories: included; excluded for each PICOS criterion; or unclassified. We used a bag-of-words approach for feature-set generation and compared machine learning algorithms using support vector machines (SVMs), naïve Bayes (NB), and bagged classification and regression trees (CART) for classification. We also compared alternative training set strategies: using full data versus downsampling (i.e., reducing excludes to balance includes/excludes because machine learning algorithms perform better with balanced data), and using inclusion/exclusion decisions from abstract versus full-text screening. Performance comparisons were in terms of specificity, sensitivity, accuracy, and matching the reason for exclusion. Results The best-fitting model (optimized sensitivity and specificity) was based on the SVM algorithm using training data based on full-text decisions, downsampling, and excluding words occurring fewer than five times. The sensitivity and specificity of this model ranged from 94 to 100%, and 54 to 89%, respectively, across the five SLRs. On average, 75% of excluded citations were excluded with a reason and 83% of these citations matched the reviewers’ original reason for exclusion. Sensitivity significantly improved when both downsampling and abstract decisions were used. Conclusions ML algorithms can improve the efficiency of the SLR process and the proposed algorithms could reduce the workload of a second reviewer by identifying exclusions with a relevant PICOS reason, thus aligning with HTA guidance. Downsampling can be used to improve study selection, and improvements using full-text exclusions have implications for a learn-as-you-go approach.

Download Full-text

Design of an Interactive Biomedical Text Mining Framework to Recognize Real-Time Drug Entities Using Machine Learning Algorithms

Procedia Computer Science ◽

10.1016/j.procs.2018.10.374 ◽

2018 ◽

Vol 143 ◽

pp. 181-188

Author(s):

Chukwuka Chukwuocha ◽

T. Mathu ◽

Kumudha Raimond

Keyword(s):

Machine Learning ◽

Text Mining ◽

Real Time ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomedical Text ◽

Biomedical Text Mining

Download Full-text

Development of a web-based application using machine learning algorithms to facilitate systematic literature reviews

Annals of Oncology ◽

10.1093/annonc/mdx385.023 ◽

2017 ◽

Vol 28 ◽

pp. v518

Author(s):

H-L. Wong ◽

T. Luechtefeld ◽

A. Prawira ◽

Z. Patterson ◽

J. Workman ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Web Based ◽

Literature Reviews

Download Full-text

A comparative study of text mining in big data analytics using deep learning and other machine learning algorithms

International Journal of Hybrid Intelligence ◽

10.1504/ijhi.2019.10025161 ◽

2019 ◽

Vol 1 (2/3) ◽

pp. 163

Author(s):

Souvik Chowdhury ◽

Shibakali Gupta

Keyword(s):

Machine Learning ◽

Big Data ◽

Deep Learning ◽

Text Mining ◽

Comparative Study ◽

Data Analytics ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms

Download Full-text

A Comparative Analysis on Medical Article Classification Using Text Mining & Machine Learning Algorithms

10.1109/ubmk52708.2021.9559001 ◽

2021 ◽

Author(s):

Burak Kolukisa ◽

Bilge Kagan Dedeturk ◽

Beyhan Adanur Dedeturk ◽

Abdulkadir Gulsen ◽

Gokhan Bakal

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Text Mining ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Mining Machine ◽

Medical Article

Download Full-text

An Extensive Text Mining Study for the Turkish Language

Advances in Business Information Systems and Analytics - Natural Language Processing for Global and Local Business ◽

10.4018/978-1-7998-4240-8.ch012 ◽

2021 ◽

pp. 272-306

Author(s):

Durmuş Özkan Şahin ◽

Erdal Kılıç

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Text Mining ◽

Language Processing ◽

Information Gain ◽

Learning Algorithms ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Chi Square

In this study, the authors give both theoretical and experimental information about text mining, which is one of the natural language processing topics. Three different text mining problems such as news classification, sentiment analysis, and author recognition are discussed for Turkish. They aim to reduce the running time and increase the performance of machine learning algorithms. Four different machine learning algorithms and two different feature selection metrics are used to solve these text classification problems. Classification algorithms are random forest (RF), logistic regression (LR), naive bayes (NB), and sequential minimal optimization (SMO). Chi-square and information gain metrics are used as the feature selection method. The highest classification performance achieved in this study is 0.895 according to the F-measure metric. This result is obtained by using the SMO classifier and information gain metric for news classification. This study is important in terms of comparing the performances of classification algorithms and feature selection methods.

Download Full-text

A comparative study of text mining in big data analytics using deep learning and other machine learning algorithms

International Journal of Hybrid Intelligence ◽

10.1504/ijhi.2019.103576 ◽

2019 ◽

Vol 1 (2/3) ◽

pp. 163

Author(s):

Souvik Chowdhury ◽

Shibakali Gupta

Keyword(s):

Machine Learning ◽

Big Data ◽

Deep Learning ◽

Text Mining ◽

Comparative Study ◽

Data Analytics ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms

Download Full-text

Easyml: Easily Build and Evaluate Machine Learning Models

10.1101/137240 ◽

2017 ◽

Cited By ~ 2

Author(s):

Woo-Young Ahn ◽

Paul Hendricks ◽

Nathaniel Haines

Keyword(s):

Machine Learning ◽

Best Practices ◽

Linear Models ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Classification Algorithms ◽

Vector Machines ◽

Barrier To Entry

AbstractThe easyml (easy machine learning) package lowers the barrier to entry to machine learning and is ideal for undergraduate/graduate students, and practitioners who want to quickly apply machine learning algorithms to their research without having to worry about the best practices of implementing each algorithm. The package provides standardized recipes for regression and classification algorithms in R and Python and implements them in a functional, modular, and extensible framework. This package currently implements recipes for several common machine learning algorithms (e.g., penalized linear models, random forests, and support vector machines) and provides a unified interface to each one. Importantly, users can run and evaluate each machine learning algorithm with a single line of coding. Each recipe is robust, implements best practices specific to each algorithm, and generates a report with details about the model, its performance, as well as journal-quality visualizations. The package’s functional, modular, and extensible framework also allows researchers and more advanced users to easily implement new recipes for other algorithms.

Download Full-text

Pairing Machine Learning and Clinical Psychology: How You Evaluate Predictive Performance Matters

10.31234/osf.io/2yber ◽

2020 ◽

Cited By ~ 1

Author(s):

Ross Jacobucci ◽

Andrew K. Littlefield ◽

Alex J. Millner ◽

Evan Kleiman ◽

Douglas Steinley

Keyword(s):

Machine Learning ◽

Best Practices ◽

Clinical Research ◽

Clinical Psychology ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Method Of Evaluation

Machine learning is being utilized at an increasing rate in clinical psychology. Applying machine learning comes with a number of challenges, both in deciding which algorithms to test, and how to evaluate the predictive performance. We focus on this last component, demonstrating across both a simulation and empirical example that the method researchers choose to evaluate prediction with machine learning can have large consequences for the substantive conclusions. More specifically, we demonstrate that one method of evaluation that has been used repeatedly in clinical research, the optimism corrected bootstrap, can result in extremely biased results when paired with specific machine learning algorithms. We conclude with providing recommendations for researchers and a discussion of additional best practices.

Download Full-text

Text Mining and Machine Learning Algorithms to Identifying Diseases and Providing Repair Action Using ICD-10 Codes

ICT Systems and Sustainability - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-5987-4_5 ◽

2022 ◽

pp. 45-53

Author(s):

Ashish P. Ramdasi ◽

S. Sathyalakshmi

Keyword(s):

Machine Learning ◽

Text Mining ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Icd 10

Download Full-text