scholarly journals Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction

2020 ◽  
Vol 7 ◽  
Author(s):  
Kuncheng Song ◽  
Fred A. Wright ◽  
Yi-Hui Zhou

Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different types of predictors when used in concert with various machine learning methods has been difficult to assess, due to varied combinations described in the literature. Here we provide an in-depth investigation of more than 1,000 combinations of these three clustering/counting methods, in combination with varied choices for normalization and filtering, grouping at various taxonomic levels, and the use of more than ten commonly used machine learning methods for phenotype prediction. The use of short k-mers, which have computational advantages and conceptual simplicity, is shown to be effective as a source for microbiome-based prediction. Among machine-learning approaches, tree-based methods show consistent, though modest, advantages in prediction accuracy. We describe the various advantages and disadvantages of combinations in analysis approaches, and provide general observations to serve as a useful guide for future trait-prediction explorations using microbiome data.

2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2764
Author(s):  
Xin Yu Liew ◽  
Nazia Hameed ◽  
Jeremie Clos

A computer-aided diagnosis (CAD) expert system is a powerful tool to efficiently assist a pathologist in achieving an early diagnosis of breast cancer. This process identifies the presence of cancer in breast tissue samples and the distinct type of cancer stages. In a standard CAD system, the main process involves image pre-processing, segmentation, feature extraction, feature selection, classification, and performance evaluation. In this review paper, we reviewed the existing state-of-the-art machine learning approaches applied at each stage involving conventional methods and deep learning methods, the comparisons within methods, and we provide technical details with advantages and disadvantages. The aims are to investigate the impact of CAD systems using histopathology images, investigate deep learning methods that outperform conventional methods, and provide a summary for future researchers to analyse and improve the existing techniques used. Lastly, we will discuss the research gaps of existing machine learning approaches for implementation and propose future direction guidelines for upcoming researchers.


Author(s):  
Basant Agarwal ◽  
Namita Mittal

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.


Big Data ◽  
2016 ◽  
pp. 1917-1933
Author(s):  
Basant Agarwal ◽  
Namita Mittal

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.


Author(s):  
Derya Yiltas-Kaplan

This chapter focuses on the process of the machine learning with considering the architecture of software-defined networks (SDNs) and their security mechanisms. In general, machine learning has been studied widely in traditional network problems, but recently there have been a limited number of studies in the literature that connect SDN security and machine learning approaches. The main reason of this situation is that the structure of SDN has emerged newly and become different from the traditional networks. These structural variances are also summarized and compared in this chapter. After the main properties of the network architectures, several intrusion detection studies on SDN are introduced and analyzed according to their advantages and disadvantages. Upon this schedule, this chapter also aims to be the first organized guide that presents the referenced studies on the SDN security and artificial intelligence together.


2020 ◽  
Vol 30 (Suppl 1) ◽  
pp. 217-228 ◽  
Author(s):  
Sanjay Basu ◽  
James H. Faghmous ◽  
Patrick Doupe

  Precision medicine research designed to reduce health disparities often involves studying multi-level datasets to understand how diseases manifest disproportionately in one group over another, and how scarce health care resources can be directed precisely to those most at risk for disease. In this article, we provide a structured tutorial for medical and public health research­ers on the application of machine learning methods to conduct precision medicine research designed to reduce health dispari­ties. We review key terms and concepts for understanding machine learning papers, including supervised and unsupervised learning, regularization, cross-validation, bagging, and boosting. Metrics are reviewed for evaluating machine learners and major families of learning approaches, including tree-based learning, deep learning, and ensemble learning. We highlight the advan­tages and disadvantages of different learning approaches, describe strategies for interpret­ing “black box” models, and demonstrate the application of common methods in an example dataset with open-source statistical code in R.Ethn Dis. 2020;30(Suppl 1):217-228; doi:10.18865/ed.30.S1.217


2019 ◽  
Vol 24 (34) ◽  
pp. 3998-4006
Author(s):  
Shijie Fan ◽  
Yu Chen ◽  
Cheng Luo ◽  
Fanwang Meng

Background: On a tide of big data, machine learning is coming to its day. Referring to huge amounts of epigenetic data coming from biological experiments and clinic, machine learning can help in detecting epigenetic features in genome, finding correlations between phenotypes and modifications in histone or genes, accelerating the screen of lead compounds targeting epigenetics diseases and many other aspects around the study on epigenetics, which consequently realizes the hope of precision medicine. Methods: In this minireview, we will focus on reviewing the fundamentals and applications of machine learning methods which are regularly used in epigenetics filed and explain their features. Their advantages and disadvantages will also be discussed. Results: Machine learning algorithms have accelerated studies in precision medicine targeting epigenetics diseases. Conclusion: In order to make full use of machine learning algorithms, one should get familiar with the pros and cons of them, which will benefit from big data by choosing the most suitable method(s).


2019 ◽  
Vol 35 (14) ◽  
pp. i31-i40 ◽  
Author(s):  
Erfan Sayyari ◽  
Ban Kawas ◽  
Siavash Mirarab

Abstract Motivation Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. Results In this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes. Availability and implementation TADA is available at https://github.com/tada-alg/TADA. Supplementary information Supplementary data are available at Bioinformatics online.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0246102
Author(s):  
Daekyum Kim ◽  
Sang-Hun Kim ◽  
Taekyoung Kim ◽  
Brian Byunghyun Kang ◽  
Minhyuk Lee ◽  
...  

Soft robots have been extensively researched due to their flexible, deformable, and adaptive characteristics. However, compared to rigid robots, soft robots have issues in modeling, calibration, and control in that the innate characteristics of the soft materials can cause complex behaviors due to non-linearity and hysteresis. To overcome these limitations, recent studies have applied various approaches based on machine learning. This paper presents existing machine learning techniques in the soft robotic fields and categorizes the implementation of machine learning approaches in different soft robotic applications, which include soft sensors, soft actuators, and applications such as soft wearable robots. An analysis of the trends of different machine learning approaches with respect to different types of soft robot applications is presented; in addition to the current limitations in the research field, followed by a summary of the existing machine learning methods for soft robots.


2018 ◽  
Author(s):  
Nicholas A. Bokulich ◽  
Matthew Dillon ◽  
Evan Bolyen ◽  
Benjamin D. Kaehler ◽  
Gavin A. Huttley ◽  
...  

AbstractMicrobiome studies often aim to predict outcomes or differentiate samples based on their microbial compositions, tasks that can be efficiently performed by supervised learning methods. Here we present a benchmark comparison of supervised learning classifiers and regressors implemented in scikit-learn, a Python-based machine-learning library. We additionally present q2-sample-classifier, a plugin for the QIIME 2 microbiome bioinformatics framework, that facilitates application of the scikit-learn classifiers to microbiome data.Random forest, extra trees, andgradient boostingmodels demonstrate the highest performance for both supervised classification and regression of microbiome data. Automated feature selection and hyperparameter tuning enhance performance of most methods but may not be necessary under all circumstances. The q2-sample-classifier plugin makes these methods more accessible and interpretable to a broad audience of microbiologists, clinicians, and others who wish to utilize supervised learning methods for predicting sample characteristics based on microbiome composition. The q2-sample-classifier source code is available athttps://github.com/qiime2/q2-sample-classifier. It is released under a BSD-3-Clause license, and is freely available including for commercial use.


Sign in / Sign up

Export Citation Format

Share Document