Functional networks inference from rule-based machine learning models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Evaluation of rule-based, CountVectorizer, and Word2Vec machine learning models for tweet analysis to improve disaster relief

10.1109/ghtc53159.2021.9612486 ◽

2021 ◽

Author(s):

Radhika Goyal

Keyword(s):

Machine Learning ◽

Disaster Relief ◽

Learning Models ◽

Rule Based ◽

Machine Learning Models

Download Full-text

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-080917-013315 ◽

2018 ◽

Vol 1 (1) ◽

pp. 53-68 ◽

Cited By ~ 40

Author(s):

Juan M. Banda ◽

Martin Seneviratne ◽

Tina Hernandez-Boussard ◽

Nigam H. Shah

Keyword(s):

Machine Learning ◽

Clinical Decision ◽

Future Research ◽

Learning Models ◽

Rule Based ◽

Research Problems ◽

Fundamental Research ◽

Future Research Directions ◽

Effectiveness Studies ◽

Machine Learning Models

With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.

Download Full-text

Raising the Flag: Monitoring User Perceived Disinformation on Reddit

Information ◽

10.3390/info12010004 ◽

2020 ◽

Vol 12 (1) ◽

pp. 4

Author(s):

Vlad Achimescu ◽

Pavel Dimitrov Chachev

Keyword(s):

Machine Learning ◽

Descriptive Analysis ◽

Learning Models ◽

Rule Based ◽

Part Of Speech ◽

News Websites ◽

Time Periods ◽

Internet Forums ◽

Rule Based Approach ◽

Machine Learning Models

The truth value of any new piece of information is not only investigated by media platforms, but also debated intensely on internet forums. Forum users are fighting back against misinformation, by informally flagging suspicious posts as false or misleading in their comments. We propose extracting posts informally flagged by Reddit users as a means to narrow down the list of potential instances of disinformation. To identify these flags, we built a dictionary enhanced with part of speech tags and dependency parsing to filter out specific phrases. Our rule-based approach performs similarly to machine learning models, but offers more transparency and interactivity. Posts matched by our technique are presented in a publicly accessible, daily updated, and customizable dashboard. This paper offers a descriptive analysis of which topics, venues, and time periods were linked to perceived misinformation in the first half of 2020, and compares user flagged sources with an external dataset of unreliable news websites. Using this method can help researchers understand how truth and falsehood are perceived in the subreddit communities, and to identify new false narratives before they spread through the larger population.

Download Full-text

Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application to Credit-Risk Evaluation

SSRN Electronic Journal ◽

10.2139/ssrn.3395422 ◽

2019 ◽

Author(s):

Cynthia Rudin ◽

Yaron Shaposhnik

Keyword(s):

Machine Learning ◽

Credit Risk ◽

Risk Evaluation ◽

Learning Models ◽

Rule Based ◽

Machine Learning Models

Download Full-text

Editable machine learning models? A rule-based framework for user studies of explainability

Advances in Data Analysis and Classification ◽

10.1007/s11634-020-00419-2 ◽

2020 ◽

Author(s):

Stanislav Vojíř ◽

Tomáš Kliegr

Keyword(s):

Machine Learning ◽

User Studies ◽

Learning Models ◽

Rule Based ◽

Machine Learning Models

Download Full-text

A review of possible effects of cognitive biases on interpretation of rule-based machine learning models

Artificial Intelligence ◽

10.1016/j.artint.2021.103458 ◽

2021 ◽

Vol 295 ◽

pp. 103458

Author(s):

Tomáš Kliegr ◽

Štěpán Bahník ◽

Johannes Fürnkranz

Keyword(s):

Machine Learning ◽

Cognitive Biases ◽

Learning Models ◽

Rule Based ◽

Machine Learning Models

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text