mAML: an automated machine learning pipeline with a microbiome repository for human disease classification

Mapping Intimacies ◽

10.1101/2020.02.11.943316 ◽

2020 ◽

Author(s):

Fenglong Yang ◽

Quan Zou

Keyword(s):

Machine Learning ◽

Human Disease ◽

High Performance ◽

Model Building ◽

Disease Classification ◽

Benchmark Datasets ◽

Automated Machine Learning ◽

Classification Tasks ◽

Interpretable Models ◽

Multi Class Classification

AbstractDue to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems designed to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbial classification tasks in a reproducible way. The pipeline is deployed on a web-based platform and the server is user-friendly, flexible, and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbial classification tasks for 85 human-disease phenotypes referring to 12,429 metagenomic samples and 38,643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments.Database URLhttp://39.100.246.211:8050/Home

Download Full-text

mAML: an automated machine learning pipeline with a microbiome repository for human disease classification

Database ◽

10.1093/database/baaa050 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Fenglong Yang ◽

Quan Zou

Keyword(s):

Machine Learning ◽

Human Disease ◽

High Performance ◽

Model Building ◽

Disease Classification ◽

Benchmark Datasets ◽

Automated Machine Learning ◽

Classification Tasks ◽

Interpretable Models ◽

Multi Class Classification

Abstract Due to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems aiming to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbiome-based classification tasks in a reproducible way. The pipeline is deployed on a web-based platform, while the server is user-friendly and flexible and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbiome-based classification tasks for 85 human-disease phenotypes referring to 12 429 metagenomic samples and 38 643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments. Database URL: http://lab.malab.cn/soft/mAML

Download Full-text

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets

Sensors ◽

10.3390/s21020656 ◽

2021 ◽

Vol 21 (2) ◽

pp. 656

Author(s):

Xavier Larriva-Novo ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera ◽

Mario Sanz Rodrigo

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

High Performance ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Statistical Characteristics ◽

Detection Techniques ◽

Traffic Characteristics ◽

Benchmark Datasets

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

Download Full-text

TPOT-NN: augmenting tree-based automated machine learning with neural network estimators

Genetic Programming and Evolvable Machines ◽

10.1007/s10710-021-09401-z ◽

2021 ◽

Author(s):

Joseph D. Romano ◽

Trang T. Le ◽

Weixuan Fu ◽

Jason H. Moore

Keyword(s):

Neural Network ◽

Machine Learning ◽

Binary Classification ◽

Inductive Learning ◽

Future Directions ◽

High Performing ◽

Learning Tasks ◽

Benchmark Datasets ◽

Automated Machine Learning ◽

Standard Tree

AbstractAutomated machine learning (AutoML) and artificial neural networks (ANNs) have revolutionized the field of artificial intelligence by yielding incredibly high-performing models to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists on when to use one versus the other. Furthermore, relatively few tools exist that allow the integration of both AutoML and ANNs in the same analysis to yield results combining both of their strengths. Here, we present TPOT-NN—a new extension to the tree-based AutoML software TPOT—and use it to explore the behavior of automated machine learning augmented with neural network estimators (AutoML+NN), particularly when compared to non-NN AutoML in the context of simple binary classification on a number of public benchmark datasets. Our observations suggest that TPOT-NN is an effective tool that achieves greater classification accuracy than standard tree-based AutoML on some datasets, with no loss in accuracy on others. We also provide preliminary guidelines for performing AutoML+NN analyses, and recommend possible future directions for AutoML+NN methods research, especially in the context of TPOT.

Download Full-text

PyDL8.5: a Library for Learning Optimal Decision Trees

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/750 ◽

2020 ◽

Author(s):

Gaël Aglin ◽

Siegfried Nijssen ◽

Pierre Schaus

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Efficient Algorithm ◽

Optimal Decision ◽

Learning Tasks ◽

Explainable Ai ◽

Classification Tasks ◽

Interpretable Models ◽

Limited Depth

Decision Trees (DTs) are widely used Machine Learning (ML) models with a broad range of applications. The interest in these models has increased even further in the context of Explainable AI (XAI), as decision trees of limited depth are very interpretable models. However, traditional algorithms for learning DTs are heuristic in nature; they may produce trees that are of suboptimal quality under depth constraints. We introduce PyDL8.5, a Python library to infer depth-constrained Optimal Decision Trees (ODTs). PyDL8.5 provides an interface for DL8.5, an efficient algorithm for inferring depth-constrained ODTs. The library provides an easy-to-use scikit-learn compatible interface. It cannot only be used for classification tasks, but also for regression, clustering, and other tasks. We introduce an interface that allows users to easily implement these other learning tasks. We provide a number of examples of how to use this library.

Download Full-text

Deep Learning Models for Pneumonia Identification and Classification Based on X-Ray Images

Traitement du signal ◽

10.18280/ts.380337 ◽

2021 ◽

Vol 38 (3) ◽

pp. 903-909

Author(s):

Veeranjaneyulu Naralasetti ◽

Reshmi Khadherbhi Shaik ◽

Gayatri Katepalli ◽

Jyostna Devi Bodapati

Keyword(s):

Experimental Studies ◽

Cross Entropy ◽

X Rays ◽

Weight Decay ◽

X Ray ◽

Proposed Model ◽

Benchmark Datasets ◽

Classification Tasks ◽

Multi Class Classification ◽

Fully Connected

Diagnosis based on chest X-rays is widely used and approved for the diagnosis of various diseases such as Pneumonia. Manually screening of theses X-ray images technician or radiologist involves expertise and time consuming. Addressing this, we propose an automated approach for the diagnosis of pneumonia by assisting doctors in spotting infected areas in the X-ray images. We propose a deep Convolutional Neural Network (CNN) model for efficiently detecting the presence of pneumonia in the X-ray images. The proposed CNN is designed with 5 convolution blocks followed by 4 fully connected layers. In order to boost the performance of the model, we incorporate batch normalization, dynamic dropout, learning rate decay, L2 regularization weight decay along with Adam optimizer and binary Cross-Entropy loss function while training the model using back propagating algorithm. The proposed model is validated on two publicly accessible benchmark datasets, and the experimental studies conducted on these datasets indicate that the proposed model is efficient. The suggested CNN architecture with specified hyper parameters allows the model to outperform several existing models by achieving accuracy of 97.73% and 91.17% respectively for binary and multi-class classification tasks of pneumonia disease.

Download Full-text

Automated clinical computational biology: an interpretable machine learning framework to predict disease severity and stratify patients from clinical data

10.31219/osf.io/9xc2j ◽

2018 ◽

Author(s):

soumya banerjee

Keyword(s):

Machine Learning ◽

Disease Severity ◽

Clinical Data ◽

Model Building ◽

Learning Experience ◽

Machine Learning Algorithms ◽

Close Collaboration ◽

Learning Framework ◽

Novel Biomarkers ◽

Automated Machine Learning

We outline an automated computational and machine learning framework that predicts disease severity andstratifies patients. We apply our framework to available clinical data. Our algorithm automatically generatesinsights and predicts disease severity with minimal operator intervention. The computational frameworkpresented here can be used to stratify patients, predict disease severity and propose novel biomarkers fordisease. Insights from machine learning algorithms coupled with clinical data may help guide therapy,personalize treatment and help clinicians understand the change in disease over time. Computationaltechniques like these can be used in translational medicine in close collaboration with clinicians and healthcareproviders. Our models are also interpretable, allowing clinicians with minimal machine learning experience toengage in model building. This work is a step towards automated machine learning in the clinic.

Download Full-text

Automated Machine Learning for High-Throughput Image-Based Plant Phenotyping

Remote Sensing ◽

10.3390/rs13050858 ◽

2021 ◽

Vol 13 (5) ◽

pp. 858

Author(s):

Joshua C.O. Koh ◽

German Spangenberg ◽

Surya Kant

Keyword(s):

Machine Learning ◽

Image Classification ◽

Transfer Learning ◽

Precision Agriculture ◽

High Performance ◽

Mean Squared Error ◽

Plant Phenotyping ◽

Percentage Error ◽

Great Promise ◽

Automated Machine Learning

Automated machine learning (AutoML) has been heralded as the next wave in artificial intelligence with its promise to deliver high-performance end-to-end machine learning pipelines with minimal effort from the user. However, despite AutoML showing great promise for computer vision tasks, to the best of our knowledge, no study has used AutoML for image-based plant phenotyping. To address this gap in knowledge, we examined the application of AutoML for image-based plant phenotyping using wheat lodging assessment with unmanned aerial vehicle (UAV) imagery as an example. The performance of an open-source AutoML framework, AutoKeras, in image classification and regression tasks was compared to transfer learning using modern convolutional neural network (CNN) architectures. For image classification, which classified plot images as lodged or non-lodged, transfer learning with Xception and DenseNet-201 achieved the best classification accuracy of 93.2%, whereas AutoKeras had a 92.4% accuracy. For image regression, which predicted lodging scores from plot images, transfer learning with DenseNet-201 had the best performance (R2 = 0.8303, root mean-squared error (RMSE) = 9.55, mean absolute error (MAE) = 7.03, mean absolute percentage error (MAPE) = 12.54%), followed closely by AutoKeras (R2 = 0.8273, RMSE = 10.65, MAE = 8.24, MAPE = 13.87%). In both tasks, AutoKeras models had up to 40-fold faster inference times compared to the pretrained CNNs. AutoML has significant potential to enhance plant phenotyping capabilities applicable in crop breeding and precision agriculture.

Download Full-text

Doing more with less

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476262 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2059-2072

Author(s):

Fatjon Zogaj ◽

José Pablo Cambronero ◽

Martin C. Rinard ◽

Jürgen Cito

Keyword(s):

Machine Learning ◽

Execution Time ◽

Time Budget ◽

Empirical Evaluation ◽

Predictive Performance ◽

Performance Metric ◽

Automated Machine Learning ◽

Classification Tasks ◽

The Impact ◽

User Intervention

Automated machine learning (AutoML) promises to democratize machine learning by automatically generating machine learning pipelines with little to no user intervention. Typically, a search procedure is used to repeatedly generate and validate candidate pipelines, maximizing a predictive performance metric, subject to a limited execution time budget. While this approach to generating candidates works well for small tabular datasets, the same procedure does not directly scale to larger tabular datasets with 100,000s of observations, often producing fewer candidate pipelines and yielding lower performance, given the same execution time budget. We carry out an extensive empirical evaluation of the impact that downsampling - reducing the number of rows in the input tabular dataset - has on the pipelines produced by a genetic-programming-based AutoML search for classification tasks.

Download Full-text

A Machine-Learning Model Based on Morphogeometric Parameters for RETICS Disease Classification and GUI Development

Applied Sciences ◽

10.3390/app10051874 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1874 ◽

Cited By ~ 2

Author(s):

José M. Bolarín ◽

F. Cavas ◽

J.S. Velázquez ◽

J.L. Alió

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Web Application ◽

High Performance ◽

Logistic Regression Model ◽

Early Stage ◽

Disease Classification ◽

Multivariate Logistic Regression Model ◽

Detection Model

This work pursues two objectives: defining a new concept of risk probability associated with suffering early-stage keratoconus, classifying disease severity according to the RETICS (Thematic Network for Co-Operative Research in Health) scale. It recruited 169 individuals, 62 healthy and 107 keratoconus diseased, grouped according to the RETICS classification: 44 grade I; 18 grade II; 15 grade III; 15 grade IV; 15 grade V. Different demographic, optical, pachymetric and eometrical parameters were measured. The collected data were used for training two machine-learning models: a multivariate logistic regression model for early keratoconus detection and an ordinal logistic regression model for RETICS grade assessments. The early keratoconus detection model showed very good sensitivity, specificity and area under ROC curve, with around 95% for training and 85% for validation. The variables that made the most significant contributions were gender, coma-like, central thickness, high-order aberrations and temporal thickness. The RETICS grade assessment also showed high-performance figures, albeit lower, with a global accuracy of 0.698 and a 95% confidence interval of 0.623–0.766. The most significant variables were CDVA, central thickness and temporal thickness. The developed web application allows the fast, objective and quantitative assessment of keratoconus in early diagnosis and RETICS grading terms.

Download Full-text