scholarly journals mAML: an automated machine learning pipeline with a microbiome repository for human disease classification

2020 ◽  
Author(s):  
Fenglong Yang ◽  
Quan Zou

AbstractDue to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems designed to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbial classification tasks in a reproducible way. The pipeline is deployed on a web-based platform and the server is user-friendly, flexible, and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbial classification tasks for 85 human-disease phenotypes referring to 12,429 metagenomic samples and 38,643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments.Database URLhttp://39.100.246.211:8050/Home

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Fenglong Yang ◽  
Quan Zou

Abstract Due to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems aiming to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbiome-based classification tasks in a reproducible way. The pipeline is deployed on a web-based platform, while the server is user-friendly and flexible and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbiome-based classification tasks for 85 human-disease phenotypes referring to 12 429 metagenomic samples and 38 643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments. Database URL: http://lab.malab.cn/soft/mAML


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Author(s):  
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.


Author(s):  
Joseph D. Romano ◽  
Trang T. Le ◽  
Weixuan Fu ◽  
Jason H. Moore

AbstractAutomated machine learning (AutoML) and artificial neural networks (ANNs) have revolutionized the field of artificial intelligence by yielding incredibly high-performing models to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists on when to use one versus the other. Furthermore, relatively few tools exist that allow the integration of both AutoML and ANNs in the same analysis to yield results combining both of their strengths. Here, we present TPOT-NN—a new extension to the tree-based AutoML software TPOT—and use it to explore the behavior of automated machine learning augmented with neural network estimators (AutoML+NN), particularly when compared to non-NN AutoML in the context of simple binary classification on a number of public benchmark datasets. Our observations suggest that TPOT-NN is an effective tool that achieves greater classification accuracy than standard tree-based AutoML on some datasets, with no loss in accuracy on others. We also provide preliminary guidelines for performing AutoML+NN analyses, and recommend possible future directions for AutoML+NN methods research, especially in the context of TPOT.


Author(s):  
Gaël Aglin ◽  
Siegfried Nijssen ◽  
Pierre Schaus

Decision Trees (DTs) are widely used Machine Learning (ML) models with a broad range of applications. The interest in these models has increased even further in the context of Explainable AI (XAI), as decision trees of limited depth are very interpretable models. However, traditional algorithms for learning DTs are heuristic in nature; they may produce trees that are of suboptimal quality under depth constraints. We introduce PyDL8.5, a Python library to infer depth-constrained Optimal Decision Trees (ODTs). PyDL8.5 provides an interface for DL8.5, an efficient algorithm for inferring depth-constrained ODTs. The library provides an easy-to-use scikit-learn compatible interface. It cannot only be used for classification tasks, but also for regression, clustering, and other tasks. We introduce an interface that allows users to easily implement these other learning tasks. We provide a number of examples of how to use this library.


2021 ◽  
Vol 38 (3) ◽  
pp. 903-909
Author(s):  
Veeranjaneyulu Naralasetti ◽  
Reshmi Khadherbhi Shaik ◽  
Gayatri Katepalli ◽  
Jyostna Devi Bodapati

Diagnosis based on chest X-rays is widely used and approved for the diagnosis of various diseases such as Pneumonia. Manually screening of theses X-ray images technician or radiologist involves expertise and time consuming. Addressing this, we propose an automated approach for the diagnosis of pneumonia by assisting doctors in spotting infected areas in the X-ray images. We propose a deep Convolutional Neural Network (CNN) model for efficiently detecting the presence of pneumonia in the X-ray images. The proposed CNN is designed with 5 convolution blocks followed by 4 fully connected layers. In order to boost the performance of the model, we incorporate batch normalization, dynamic dropout, learning rate decay, L2 regularization weight decay along with Adam optimizer and binary Cross-Entropy loss function while training the model using back propagating algorithm. The proposed model is validated on two publicly accessible benchmark datasets, and the experimental studies conducted on these datasets indicate that the proposed model is efficient. The suggested CNN architecture with specified hyper parameters allows the model to outperform several existing models by achieving accuracy of 97.73% and 91.17% respectively for binary and multi-class classification tasks of pneumonia disease.


2018 ◽  
Author(s):  
soumya banerjee

We outline an automated computational and machine learning framework that predicts disease severity andstratifies patients. We apply our framework to available clinical data. Our algorithm automatically generatesinsights and predicts disease severity with minimal operator intervention. The computational frameworkpresented here can be used to stratify patients, predict disease severity and propose novel biomarkers fordisease. Insights from machine learning algorithms coupled with clinical data may help guide therapy,personalize treatment and help clinicians understand the change in disease over time. Computationaltechniques like these can be used in translational medicine in close collaboration with clinicians and healthcareproviders. Our models are also interpretable, allowing clinicians with minimal machine learning experience toengage in model building. This work is a step towards automated machine learning in the clinic.


2021 ◽  
Vol 13 (5) ◽  
pp. 858
Author(s):  
Joshua C.O. Koh ◽  
German Spangenberg ◽  
Surya Kant

Automated machine learning (AutoML) has been heralded as the next wave in artificial intelligence with its promise to deliver high-performance end-to-end machine learning pipelines with minimal effort from the user. However, despite AutoML showing great promise for computer vision tasks, to the best of our knowledge, no study has used AutoML for image-based plant phenotyping. To address this gap in knowledge, we examined the application of AutoML for image-based plant phenotyping using wheat lodging assessment with unmanned aerial vehicle (UAV) imagery as an example. The performance of an open-source AutoML framework, AutoKeras, in image classification and regression tasks was compared to transfer learning using modern convolutional neural network (CNN) architectures. For image classification, which classified plot images as lodged or non-lodged, transfer learning with Xception and DenseNet-201 achieved the best classification accuracy of 93.2%, whereas AutoKeras had a 92.4% accuracy. For image regression, which predicted lodging scores from plot images, transfer learning with DenseNet-201 had the best performance (R2 = 0.8303, root mean-squared error (RMSE) = 9.55, mean absolute error (MAE) = 7.03, mean absolute percentage error (MAPE) = 12.54%), followed closely by AutoKeras (R2 = 0.8273, RMSE = 10.65, MAE = 8.24, MAPE = 13.87%). In both tasks, AutoKeras models had up to 40-fold faster inference times compared to the pretrained CNNs. AutoML has significant potential to enhance plant phenotyping capabilities applicable in crop breeding and precision agriculture.


2021 ◽  
Vol 14 (11) ◽  
pp. 2059-2072
Author(s):  
Fatjon Zogaj ◽  
José Pablo Cambronero ◽  
Martin C. Rinard ◽  
Jürgen Cito

Automated machine learning (AutoML) promises to democratize machine learning by automatically generating machine learning pipelines with little to no user intervention. Typically, a search procedure is used to repeatedly generate and validate candidate pipelines, maximizing a predictive performance metric, subject to a limited execution time budget. While this approach to generating candidates works well for small tabular datasets, the same procedure does not directly scale to larger tabular datasets with 100,000s of observations, often producing fewer candidate pipelines and yielding lower performance, given the same execution time budget. We carry out an extensive empirical evaluation of the impact that downsampling - reducing the number of rows in the input tabular dataset - has on the pipelines produced by a genetic-programming-based AutoML search for classification tasks.


2020 ◽  
Vol 10 (5) ◽  
pp. 1874 ◽  
Author(s):  
José M. Bolarín ◽  
F. Cavas ◽  
J.S. Velázquez ◽  
J.L. Alió

This work pursues two objectives: defining a new concept of risk probability associated with suffering early-stage keratoconus, classifying disease severity according to the RETICS (Thematic Network for Co-Operative Research in Health) scale. It recruited 169 individuals, 62 healthy and 107 keratoconus diseased, grouped according to the RETICS classification: 44 grade I; 18 grade II; 15 grade III; 15 grade IV; 15 grade V. Different demographic, optical, pachymetric and eometrical parameters were measured. The collected data were used for training two machine-learning models: a multivariate logistic regression model for early keratoconus detection and an ordinal logistic regression model for RETICS grade assessments. The early keratoconus detection model showed very good sensitivity, specificity and area under ROC curve, with around 95% for training and 85% for validation. The variables that made the most significant contributions were gender, coma-like, central thickness, high-order aberrations and temporal thickness. The RETICS grade assessment also showed high-performance figures, albeit lower, with a global accuracy of 0.698 and a 95% confidence interval of 0.623–0.766. The most significant variables were CDVA, central thickness and temporal thickness. The developed web application allows the fast, objective and quantitative assessment of keratoconus in early diagnosis and RETICS grading terms.


Sign in / Sign up

Export Citation Format

Share Document