Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models

With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Evaluation of rule-based, CountVectorizer, and Word2Vec machine learning models for tweet analysis to improve disaster relief

10.1109/ghtc53159.2021.9612486 ◽

2021 ◽

Author(s):

Radhika Goyal

Keyword(s):

Machine Learning ◽

Disaster Relief ◽

Learning Models ◽

Rule Based ◽

Machine Learning Models

Download Full-text

EFFECTIVE COMMUNICATION OF MACHINE LEARNING MODELS IN CLINICAL DECISION SUPPORT TOOLS FOR PAH RISK PREDICTION

CHEST Journal ◽

10.1016/j.chest.2021.07.1277 ◽

2021 ◽

Vol 160 (4) ◽

pp. A1396-A1397

Author(s):

Raymond Benza ◽

Manreet Kanwar ◽

James Antaki ◽

Aditi Dhabalia ◽

Mia Manavalan ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Risk Prediction ◽

Clinical Decision Support ◽

Clinical Decision ◽

Decision Support Tools ◽

Learning Models ◽

Support Tools ◽

Clinical Decision Support Tools ◽

Machine Learning Models

Download Full-text

Machine Learning Approach to Reduce Alert Fatigue Using a Disease Medication–Related Clinical Decision Support System: Model Development and Validation (Preprint)

10.2196/preprints.19489 ◽

2020 ◽

Author(s):

Tahmina Nasrin Poly ◽

Md.Mohaimenul Islam ◽

Muhammad Solihuddin Muhtar ◽

Hsuan-Chia Yang ◽

Phung Anh (Alex) Nguyen ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Clinical Decision Support ◽

Prediction Models ◽

Clinical Decision ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Alert Fatigue ◽

Machine Learning Models

BACKGROUND Computerized physician order entry (CPOE) systems are incorporated into clinical decision support systems (CDSSs) to reduce medication errors and improve patient safety. Automatic alerts generated from CDSSs can directly assist physicians in making useful clinical decisions and can help shape prescribing behavior. Multiple studies reported that approximately 90%-96% of alerts are overridden by physicians, which raises questions about the effectiveness of CDSSs. There is intense interest in developing sophisticated methods to combat alert fatigue, but there is no consensus on the optimal approaches so far. OBJECTIVE Our objective was to develop machine learning prediction models to predict physicians’ responses in order to reduce alert fatigue from disease medication–related CDSSs. METHODS We collected data from a disease medication–related CDSS from a university teaching hospital in Taiwan. We considered prescriptions that triggered alerts in the CDSS between August 2018 and May 2019. Machine learning models, such as artificial neural network (ANN), random forest (RF), naïve Bayes (NB), gradient boosting (GB), and support vector machine (SVM), were used to develop prediction models. The data were randomly split into training (80%) and testing (20%) datasets. RESULTS A total of 6453 prescriptions were used in our model. The ANN machine learning prediction model demonstrated excellent discrimination (area under the receiver operating characteristic curve [AUROC] 0.94; accuracy 0.85), whereas the RF, NB, GB, and SVM models had AUROCs of 0.93, 0.91, 0.91, and 0.80, respectively. The sensitivity and specificity of the ANN model were 0.87 and 0.83, respectively. CONCLUSIONS In this study, ANN showed substantially better performance in predicting individual physician responses to an alert from a disease medication–related CDSS, as compared to the other models. To our knowledge, this is the first study to use machine learning models to predict physician responses to alerts; furthermore, it can help to develop sophisticated CDSSs in real-world clinical settings.

Download Full-text

A Universal Screening Tool for Dyslexia by a Web-Game and Machine Learning

Frontiers in Computer Science ◽

10.3389/fcomp.2021.628634 ◽

2022 ◽

Vol 3 ◽

Author(s):

Maria Rauschenberger ◽

Ricardo Baeza-Yates ◽

Luz Rello

Keyword(s):

Machine Learning ◽

Early Intervention ◽

Universal Screening ◽

User Study ◽

Future Research ◽

Learning Models ◽

Early Screening ◽

Web Based ◽

Negative Side Effects ◽

Machine Learning Models

Children with dyslexia have difficulties learning how to read and write. They are often diagnosed after they fail school even if dyslexia is not related to general intelligence. Early screening of dyslexia can prevent the negative side effects of late detection and enables early intervention. In this context, we present an approach for universal screening of dyslexia using machine learning models with data gathered from a web-based language-independent game. We designed the game content taking into consideration the analysis of mistakes of people with dyslexia in different languages and other parameters related to dyslexia like auditory perception as well as visual perception. We did a user study with 313 children (116 with dyslexia) and train predictive machine learning models with the collected data. Our method yields an accuracy of 0.74 for German and 0.69 for Spanish as well as a F1-score of 0.75 for German and 0.75 for Spanish, using Random Forests and Extra Trees, respectively. We also present the game content design, potential new auditory input, and knowledge about the design approach for future research to explore Universal screening of dyslexia. universal screening with language-independent content can be used for the screening of pre-readers who do not have any language skills, facilitating a potential early intervention.

Download Full-text

A Systematised State-of-the-Art Review of Machine Learning Models to Aid Clinical Decision-Making in Epilepsy

SSRN Electronic Journal ◽

10.2139/ssrn.3541140 ◽

2020 ◽

Author(s):

Edward Jonathan Han-Burgess ◽

Richard J. Stevens

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Decision Making ◽

State Of The Art ◽

Clinical Decision ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Interpretable Machine Learning Models for Clinical Decision-Making in a High-Need, Value-Based Primary Care Setting

NEJM Catalyst ◽

10.1056/cat.21.0008 ◽

2021 ◽

Vol 2 (4) ◽

Author(s):

Surabhi Bhatt ◽

Adam Cohon ◽

Jenna Rose ◽

Natalia Majerczyk ◽

Brian Cozzi ◽

...

Keyword(s):

Machine Learning ◽

Primary Care ◽

Decision Making ◽

Care Setting ◽

Clinical Decision Making ◽

Primary Care Setting ◽

Clinical Decision ◽

Learning Models ◽

Interpretable Machine Learning ◽

Machine Learning Models

Download Full-text

Functional networks inference from rule-based machine learning models

BioData Mining ◽

10.1186/s13040-016-0106-4 ◽

2016 ◽

Vol 9 (1) ◽

Cited By ~ 4

Author(s):

Nicola Lazzarini ◽

Paweł Widera ◽

Stuart Williamson ◽

Rakesh Heer ◽

Natalio Krasnogor ◽

...

Keyword(s):

Machine Learning ◽

Functional Networks ◽

Learning Models ◽

Rule Based ◽

Machine Learning Models

Download Full-text

Risk Assessment in Energy Infrastructure Installations by Horizontal Directional Drilling Using Machine Learning

Energies ◽

10.3390/en14020289 ◽

2021 ◽

Vol 14 (2) ◽

pp. 289

Author(s):

Maria Krechowicz ◽

Adam Krechowicz

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Assessment Process ◽

Future Research ◽

Directional Drilling ◽

Ann Model ◽

Learning Models ◽

Horizontal Directional Drilling ◽

Risk Assessment Process ◽

Machine Learning Models

Nowadays we can observe a growing demand for installations of new gas pipelines in Europe. A large number of them are installed using trenchless Horizontal Directional Drilling (HDD) technology. The aim of this work was to develop and compare new machine learning models dedicated for risk assessment in HDD projects. The data from 133 HDD projects from eight countries of the world were gathered, profiled, and preprocessed. Three machine learning models, logistic regression, random forests, and Artificial Neural Network (ANN), were developed to predict the overall HDD project outcome (failure free installation or installation likely to fail), and the occurrence of identified unwanted events. The best performance in terms of recall and accuracy was achieved for the developed ANN model, which proved to be efficient, fast and robust in predicting risks in HDD projects. Machine learning applications in the proposed models enabled eliminating the involvement of a group of experts in the risk assessment process and therefore significantly lower the costs associated with the risk assessment process. Future research may be oriented towards developing a comprehensive risk management system, which will enable dynamic risk assessment taking into account various combinations of risk mitigation actions.

Download Full-text

Machine Learning Approach to Reduce Alert Fatigue Using a Disease Medication–Related Clinical Decision Support System: Model Development and Validation

JMIR Medical Informatics ◽

10.2196/19489 ◽

2020 ◽

Vol 8 (11) ◽

pp. e19489

Author(s):

Tahmina Nasrin Poly ◽

Md.Mohaimenul Islam ◽

Muhammad Solihuddin Muhtar ◽

Hsuan-Chia Yang ◽

Phung Anh (Alex) Nguyen ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Clinical Decision Support ◽

Prediction Models ◽

Clinical Decision ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Alert Fatigue ◽

Machine Learning Models

Background Computerized physician order entry (CPOE) systems are incorporated into clinical decision support systems (CDSSs) to reduce medication errors and improve patient safety. Automatic alerts generated from CDSSs can directly assist physicians in making useful clinical decisions and can help shape prescribing behavior. Multiple studies reported that approximately 90%-96% of alerts are overridden by physicians, which raises questions about the effectiveness of CDSSs. There is intense interest in developing sophisticated methods to combat alert fatigue, but there is no consensus on the optimal approaches so far. Objective Our objective was to develop machine learning prediction models to predict physicians’ responses in order to reduce alert fatigue from disease medication–related CDSSs. Methods We collected data from a disease medication–related CDSS from a university teaching hospital in Taiwan. We considered prescriptions that triggered alerts in the CDSS between August 2018 and May 2019. Machine learning models, such as artificial neural network (ANN), random forest (RF), naïve Bayes (NB), gradient boosting (GB), and support vector machine (SVM), were used to develop prediction models. The data were randomly split into training (80%) and testing (20%) datasets. Results A total of 6453 prescriptions were used in our model. The ANN machine learning prediction model demonstrated excellent discrimination (area under the receiver operating characteristic curve [AUROC] 0.94; accuracy 0.85), whereas the RF, NB, GB, and SVM models had AUROCs of 0.93, 0.91, 0.91, and 0.80, respectively. The sensitivity and specificity of the ANN model were 0.87 and 0.83, respectively. Conclusions In this study, ANN showed substantially better performance in predicting individual physician responses to an alert from a disease medication–related CDSS, as compared to the other models. To our knowledge, this is the first study to use machine learning models to predict physician responses to alerts; furthermore, it can help to develop sophisticated CDSSs in real-world clinical settings.

Download Full-text