scholarly journals Automatic Tuning of Rule-Based Evolutionary Machine Learning via Problem Structure Identification

2020 ◽  
Vol 15 (3) ◽  
pp. 28-46
Author(s):  
Maria A. Franco ◽  
Natalio Krasnogor ◽  
Jaume Bacardit
Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


Author(s):  
Katherine Darveau ◽  
Daniel Hannon ◽  
Chad Foster

There is growing interest in the study and practice of applying data science (DS) and machine learning (ML) to automate decision making in safety-critical industries. As an alternative or augmentation to human review, there are opportunities to explore these methods for classifying aviation operational events by root cause. This study seeks to apply a thoughtful approach to design, compare, and combine rule-based and ML techniques to classify events caused by human error in aircraft/engine assembly, maintenance or operation. Event reports contain a combination of continuous parameters, unstructured text entries, and categorical selections. A Human Factors approach to classifier development prioritizes the evaluation of distinct data features and entry methods to improve modeling. Findings, including the performance of tested models, led to recommendations for the design of textual data collection systems and classification approaches.


2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2021 ◽  
Author(s):  
george chang ◽  
Nathaniel Woody ◽  
Christopher Keefer

Lipophilicity is a fundamental structural property that influences almost every aspect of drug discovery. Within Pfizer, we have two complementary high-throughput screens for measuring lipophilicity as a distribution coefficient (LogD) – a miniaturized shake-flask method (SFLogD) and a chromatographic method (ELogD). The results from these two assays are not the same (see Figure 1), with each assay being applicable or more reliable in particular chemical spaces. In addition to LogD assays, the ability to predict the LogD value for virtual compounds is equally vital. Here we present an in-silico LogD model, applicable to all chemical spaces, based on the integration of the LogD data from both assays. We developed two approaches towards a single LogD model – a Rule-based and a Machine Learning approach. Ultimately, the Machine Learning LogD model was found to be superior to both internally developed and commercial LogD models.<br>


Sign in / Sign up

Export Citation Format

Share Document