The Rest Is Noise: Finding Signals in Lung Microbiome Data Analysis

Author(s):  
Alba Boix-Amorós ◽  
Alison G. Lee ◽  
Jose C. Clemente
2017 ◽  
Vol 34 (8) ◽  
pp. 1411-1413 ◽  
Author(s):  
Nick Weber ◽  
David Liou ◽  
Jennifer Dommer ◽  
Philip MacMenamin ◽  
Mariam Quiñones ◽  
...  

2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Yuanyuan Ma ◽  
Junmin Zhao ◽  
Yingjun Ma

Abstract Background With the rapid development of high-throughput technique, multiple heterogeneous omics data have been accumulated vastly (e.g., genomics, proteomics and metabolomics data). Integrating information from multiple sources or views is challenging to obtain a profound insight into the complicated relations among micro-organisms, nutrients and host environment. In this paper we propose a multi-view Hessian regularization based symmetric nonnegative matrix factorization algorithm (MHSNMF) for clustering heterogeneous microbiome data. Compared with many existing approaches, the advantages of MHSNMF lie in: (1) MHSNMF combines multiple Hessian regularization to leverage the high-order information from the same cohort of instances with multiple representations; (2) MHSNMF utilities the advantages of SNMF and naturally handles the complex relationship among microbiome samples; (3) uses the consensus matrix obtained by MHSNMF, we also design a novel approach to predict the classification of new microbiome samples. Results We conduct extensive experiments on two real-word datasets (Three-source dataset and Human Microbiome Plan dataset), the experimental results show that the proposed MHSNMF algorithm outperforms other baseline and state-of-the-art methods. Compared with other methods, MHSNMF achieves the best performance (accuracy: 95.28%, normalized mutual information: 91.79%) on microbiome data. It suggests the potential application of MHSNMF in microbiome data analysis. Conclusions Results show that the proposed MHSNMF algorithm can effectively combine the phylogenetic, transporter, and metabolic profiles into a unified paradigm to analyze the relationships among different microbiome samples. Furthermore, the proposed prediction method based on MHSNMF has been shown to be effective in judging the types of new microbiome samples.


2020 ◽  
Author(s):  
Quy Xuan Cao ◽  
Xinxin Sun ◽  
Karun Rajesh ◽  
Naga Chalasani ◽  
Kayla Gelow ◽  
...  

Abstract Background: Accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Filtering is defined as removing taxa that are present in a small number of samples and have small counts in the samples where they are observed. This approach reduces extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured "mock" datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation, as well as its impact on identifying taxa that discriminate between disease states. Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs, while preserving between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially expressed across groups of samples, and random forest models to rank features with largest contribution towards disease classiffcation. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between filtering and contaminant removal method shows that they have complementary effects and are advised to be used in conjunction. Conclusions: Filtering reduces the complexity of microbiome data, while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis.


2019 ◽  
Author(s):  
Jinbing Bai ◽  
Ileen Jhaney ◽  
Jessica Wells

BACKGROUND Cloud computing for microbiome data sets can significantly increase working efficiencies and expedite the translation of research findings into clinical practice. The Amazon Web Services (AWS) cloud provides an invaluable option for microbiome data storage, computation, and analysis. OBJECTIVE The goals of this study were to develop a microbiome data analysis pipeline by using AWS cloud and to conduct a proof-of-concept test for microbiome data storage, processing, and analysis. METHODS A multidisciplinary team was formed to develop and test a reproducible microbiome data analysis pipeline with multiple AWS cloud services that could be used for storage, computation, and data analysis. The microbiome data analysis pipeline developed in AWS was tested by using two data sets: 19 vaginal microbiome samples and 50 gut microbiome samples. RESULTS Using AWS features, we developed a microbiome data analysis pipeline that included Amazon Simple Storage Service for microbiome sequence storage, Linux Elastic Compute Cloud (EC2) instances (ie, servers) for data computation and analysis, and security keys to create and manage the use of encryption for the pipeline. Bioinformatics and statistical tools (ie, Quantitative Insights Into Microbial Ecology 2 and RStudio) were installed within the Linux EC2 instances to run microbiome statistical analysis. The microbiome data analysis pipeline was performed through command-line interfaces within the Linux operating system or in the Mac operating system. Using this new pipeline, we were able to successfully process and analyze 50 gut microbiome samples within 4 hours at a very low cost (a c4.4xlarge EC2 instance costs $0.80 per hour). Gut microbiome findings regarding diversity, taxonomy, and abundance analyses were easily shared within our research team. CONCLUSIONS Building a microbiome data analysis pipeline with AWS cloud is feasible. This pipeline is highly reliable, computationally powerful, and cost effective. Our AWS-based microbiome analysis pipeline provides an efficient tool to conduct microbiome data analysis.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Teresia M. Buza ◽  
Triza Tonui ◽  
Francesca Stomeo ◽  
Christian Tiambo ◽  
Robab Katani ◽  
...  

2020 ◽  
Vol 14 (1) ◽  
pp. 494-517
Author(s):  
Boyu Ren ◽  
Sergio Bacallado ◽  
Stefano Favaro ◽  
Tommi Vatanen ◽  
Curtis Huttenhower ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document