Prediction and Classification of Rheumatoid Arthritis using Ensemble Machine Learning Approaches

Rheumatoid arthritis (RA) is a multifactorial, complex autoimmune disease that involves various genetic, environmental, and epigenetic factors. Systems biology approaches provide the means to study complex diseases by integrating different layers of biological information. Combining multiple data types can help compensate for missing or conflicting information and limit the possibility of false positives. In this work, we aim to unravel mechanisms governing the regulation of key transcription factors in RA and derive patient-specific models to gain more insights into the disease heterogeneity and the response to treatment. We first use publicly available transcriptomic datasets (peripheral blood) relative to RA and machine learning to create an RA-specific transcription factor (TF) co-regulatory network. The TF cooperativity network is subsequently enriched in signalling cascades and upstream regulators using a state-of-the-art, RA-specific molecular map. Then, the integrative network is used as a template to analyse patients’ data regarding their response to anti-TNF treatment and identify master regulators and upstream cascades affected by the treatment. Finally, we use the Boolean formalism to simulate in silico subparts of the integrated network and identify combinations and conditions that can switch on or off the identified TFs, mimicking the effects of single and combined perturbations.

Download Full-text

Machine learning approaches to predict the Plant-associated phenotype of Xanthomonas strains

BMC Genomics ◽

10.1186/s12864-021-08093-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dennie te Molder ◽

Wasin Poncheewin ◽

Peter J. Schaap ◽

Jasper J. Koehorst

Keyword(s):

Machine Learning ◽

Plant Pathogens ◽

De Novo ◽

Classification Algorithms ◽

Learning Approaches ◽

Enabling Factors ◽

Essential Step ◽

The World ◽

Genome Content

Abstract Background The genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members. As Xanthomonas species are prevalent pathogens on a wide variety of important crops around the world, there is a need to distinguish between these plant-associated phenotypes. To date a large number of Xanthomonas genomes have been sequenced, which enables the application of machine learning (ML) approaches on the genome content to predict this phenotype. Until now such approaches to the pathogenomics of Xanthomonas strains have been hampered by the fragmentation of information regarding pathogenicity of individual strains over many studies. Unification of this information into a single resource was therefore considered to be an essential step. Results Mining of 39 papers considering both plant-associated phenotypes, allowed for a phenotypic classification of 578 Xanthomonas strains. For 65 plant-pathogenic and 53 non-pathogenic strains the corresponding genomes were available and de novo annotated for the presence of Pfam protein domains used as features to train and compare three ML classification algorithms; CART, Lasso and Random Forest. Conclusion The literature resource in combination with recursive feature extraction used in the ML classification algorithms provided further insights into the virulence enabling factors, but also highlighted domains linked to traits not present in pathogenic strains.

Download Full-text

An Ensemble Machine Learning Technique for Functional Requirement Classification

Symmetry ◽

10.3390/sym12101601 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1601

Author(s):

Nouf Rahimi ◽

Fathy Eassa ◽

Lamiaa Elrefaei

Keyword(s):

Machine Learning ◽

Requirement Engineering ◽

Support Vector ◽

Functional Requirement ◽

Machine Learning Technique ◽

Engineering Software ◽

Ensemble Machine Learning ◽

Learning Technique ◽

Combined Models

In Requirement Engineering, software requirements are classified into two main categories: Functional Requirement (FR) and Non-Functional Requirement (NFR). FR describes user and system goals. NFR includes all constraints on services and functions. Deeper classification of those two categories facilitates the software development process. There are many techniques for classifying FR; some of them are Machine Learning (ML) techniques, and others are traditional. To date, the classification accuracy has not been satisfactory. In this paper, we introduce a new ensemble ML technique for classifying FR statements to improve their accuracy and availability. This technique combines different ML models and uses enhanced accuracy as a weight in the weighted ensemble voting approach. The five combined models are Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Logistic Regression, and Support Vector Classification (SVC). The technique was implemented, trained, and tested using a collected dataset. The accuracy of classifying FR was 99.45%, and the required time was 0.7 s.

Download Full-text

Classification of Neurotransmitter Response in Dynamic PET Data Using Machine Learning Approaches

IEEE Transactions on Radiation and Plasma Medical Sciences ◽

10.1109/trpms.2020.2984259 ◽

2020 ◽

Vol 4 (6) ◽

pp. 708-719 ◽

Cited By ~ 2

Author(s):

Oliver K. Fuller ◽

Georgios I. Angelis ◽

Steven R. Meikle

Keyword(s):

Machine Learning ◽

Learning Approaches ◽

Dynamic Pet

Download Full-text

A genetic programming-based approach and machine learning approaches to the classification of multiclass anti-malarial datasets

International Journal of Computational Biology and Drug Design ◽

10.1504/ijcbdd.2018.096125 ◽

2018 ◽

Vol 11 (4) ◽

pp. 275 ◽

Cited By ~ 1

Author(s):

Madhulata Kumari ◽

Neeraj Tiwari ◽

Naidu Subbarao

Keyword(s):

Machine Learning ◽

Genetic Programming ◽

Learning Approaches

Download Full-text

Classification of Driver Distraction: A Comprehensive Analysis of Feature Generation, Machine Learning, and Input Measures

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1177/0018720819856454 ◽

2019 ◽

Vol 62 (6) ◽

pp. 1019-1035 ◽

Cited By ~ 7

Author(s):

Anthony D. McDonald ◽

Thomas K. Ferris ◽

Tyler A. Wiener

Keyword(s):

Machine Learning ◽

Driving Behavior ◽

Driver Distraction ◽

Machine Learning Algorithms ◽

Physiological Data ◽

Learning Approaches ◽

Feature Generation ◽

Driver Performance ◽

Ensemble Machine Learning ◽

Vehicle Information

Objective The objective of this study was to analyze a set of driver performance and physiological data using advanced machine learning approaches, including feature generation, to determine the best-performing algorithms for detecting driver distraction and predicting the source of distraction. Background Distracted driving is a causal factor in many vehicle crashes, often resulting in injuries and deaths. As mobile devices and in-vehicle information systems become more prevalent, the ability to detect and mitigate driver distraction becomes more important. Method This study trained 21 algorithms to identify when drivers were distracted by secondary cognitive and texting tasks. The algorithms included physiological and driving behavioral input processed with a comprehensive feature generation package, Time Series Feature Extraction based on Scalable Hypothesis tests. Results Results showed that a Random Forest algorithm, trained using only driving behavior measures and excluding driver physiological data, was the highest-performing algorithm for accurately classifying driver distraction. The most important input measures identified were lane offset, speed, and steering, whereas the most important feature types were standard deviation, quantiles, and nonlinear transforms. Conclusion This work suggests that distraction detection algorithms may be improved by considering ensemble machine learning algorithms that are trained with driving behavior measures and nonstandard features. In addition, the study presents several new indicators of distraction derived from speed and steering measures. Application Future development of distraction mitigation systems should focus on driver behavior–based algorithms that use complex feature generation techniques.

Download Full-text