microbiome data analysis
Recently Published Documents


TOTAL DOCUMENTS

33
(FIVE YEARS 22)

H-INDEX

6
(FIVE YEARS 2)

2021 ◽  
Vol 12 ◽  
Author(s):  
Zhiyu Deng ◽  
Jinming Zhang ◽  
Junya Li ◽  
Xiujun Zhang

Unraveling the association between microbiome and plant phenotype can illustrate the effect of microbiome on host and then guide the agriculture management. Adequate identification of species and appropriate choice of models are two challenges in microbiome data analysis. Computational models of microbiome data could help in association analysis between the microbiome and plant host. The deep learning methods have been widely used to learn the microbiome data due to their powerful strength of handling the complex, sparse, noisy, and high-dimensional data. Here, we review the analytic strategies in the microbiome data analysis and describe the applications of deep learning models for plant–microbiome correlation studies. We also introduce the application cases of different models in plant–microbiome correlation analysis and discuss how to adapt the models on the critical steps in data processing. From the aspect of data processing manner, model structure, and operating principle, most deep learning models are suitable for the plant microbiome data analysis. The ability of feature representation and pattern recognition is the advantage of deep learning methods in modeling and interpretation for association analysis. Based on published computational experiments, the convolutional neural network and graph neural networks could be recommended for plant microbiome analysis.


2021 ◽  
Author(s):  
Zequn Sun ◽  
Jing Zhao ◽  
Zhaoqian Liu ◽  
Qin Ma ◽  
Dongjun Chung

AbstractIdentification of disease-associated microbial species is of great biological and clinical interest. However, this investigation still remains challenges due to heterogeneity in microbial composition between individuals, data quality issues, and complex relationships among species. In this paper, we propose a novel data purification algorithm that allows elimination of noise observations, which leads to increased statistical power to detect disease-associated microbial species. We illustrate the proposed algorithm using the metagenomic data generated from colorectal cancer patients.


mSystems ◽  
2021 ◽  
Author(s):  
Izaak Coleman ◽  
Tal Korem

A central paradigm in microbiome data analysis, which we term the genome-centric paradigm, is that a linear (nonbranching) DNA sequence is the ideal representation of a microbial genome. This representation is natural, as microbes indeed have nonbranching genomes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruochen Jiang ◽  
Wei Vivian Li ◽  
Jingyi Jessica Li

AbstractA critical challenge in microbiome data analysis is the existence of many non-biological zeros, which distort taxon abundance distributions, complicate data analysis, and jeopardize the reliability of scientific discoveries. To address this issue, we propose the first imputation method for microbiome data—mbImpute—to identify and recover likely non-biological zeros by borrowing information jointly from similar samples, similar taxa, and optional metadata including sample covariates and taxon phylogeny. We demonstrate that mbImpute improves the power of identifying disease-related taxa from microbiome data of type 2 diabetes and colorectal cancer, and mbImpute preserves non-zero distributions of taxa abundances.


2021 ◽  
Vol 11 ◽  
Author(s):  
Quy Cao ◽  
Xinxin Sun ◽  
Karun Rajesh ◽  
Naga Chalasani ◽  
Kayla Gelow ◽  
...  

Background: The accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Removing contaminants and filtering rare features are two common approaches to deal with this problem. While contaminant detection methods use auxiliary sequencing process information to identify known contaminants, filtering methods remove taxa that are present in a small number of samples and have small counts in the samples where they are observed. The latter approach reduces the extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured “mock” datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation as well as its impact on identifying taxa that discriminate between disease states.Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where the same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs while preserving the between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially abundant across groups of samples, and random forest models were used to rank features with the largest contribution toward disease classification. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between the filtering and the contaminant removal method shows that they have complementary effects and are advised to be used in conjunction.Conclusions: Filtering reduces the complexity of microbiome data while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis.


2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Yuanyuan Ma ◽  
Junmin Zhao ◽  
Yingjun Ma

Abstract Background With the rapid development of high-throughput technique, multiple heterogeneous omics data have been accumulated vastly (e.g., genomics, proteomics and metabolomics data). Integrating information from multiple sources or views is challenging to obtain a profound insight into the complicated relations among micro-organisms, nutrients and host environment. In this paper we propose a multi-view Hessian regularization based symmetric nonnegative matrix factorization algorithm (MHSNMF) for clustering heterogeneous microbiome data. Compared with many existing approaches, the advantages of MHSNMF lie in: (1) MHSNMF combines multiple Hessian regularization to leverage the high-order information from the same cohort of instances with multiple representations; (2) MHSNMF utilities the advantages of SNMF and naturally handles the complex relationship among microbiome samples; (3) uses the consensus matrix obtained by MHSNMF, we also design a novel approach to predict the classification of new microbiome samples. Results We conduct extensive experiments on two real-word datasets (Three-source dataset and Human Microbiome Plan dataset), the experimental results show that the proposed MHSNMF algorithm outperforms other baseline and state-of-the-art methods. Compared with other methods, MHSNMF achieves the best performance (accuracy: 95.28%, normalized mutual information: 91.79%) on microbiome data. It suggests the potential application of MHSNMF in microbiome data analysis. Conclusions Results show that the proposed MHSNMF algorithm can effectively combine the phylogenetic, transporter, and metabolic profiles into a unified paradigm to analyze the relationships among different microbiome samples. Furthermore, the proposed prediction method based on MHSNMF has been shown to be effective in judging the types of new microbiome samples.


2020 ◽  
Author(s):  
Quy Xuan Cao ◽  
Xinxin Sun ◽  
Karun Rajesh ◽  
Naga Chalasani ◽  
Kayla Gelow ◽  
...  

Abstract Background: Accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Filtering is defined as removing taxa that are present in a small number of samples and have small counts in the samples where they are observed. This approach reduces extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured "mock" datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation, as well as its impact on identifying taxa that discriminate between disease states. Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs, while preserving between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially expressed across groups of samples, and random forest models to rank features with largest contribution towards disease classiffcation. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between filtering and contaminant removal method shows that they have complementary effects and are advised to be used in conjunction. Conclusions: Filtering reduces the complexity of microbiome data, while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis.


Sign in / Sign up

Export Citation Format

Share Document