Statistical Object Data Analysis of Taxonomic Trees from Human Microbiome Data

The rich data produced by the second phase of the Human Microbiome Project (iHMP) offers a unique opportunity to test hypotheses that interactions between microbial communities and a human host might impact an individual’s health or disease status. In this work we describe infrastructure that integrates Metaviz, an interactive microbiome data analysis and visualization tool, with the iHMP Data Coordination Center web portal and the HMP2Data R/Bioconductor package. We describe integrative statistical and visual analyses of two datasets from iHMP using Metaviz along with the metagenomeSeq R/Bioconductor package for statistical analysis of differential abundance analysis. These use cases demonstrate the utility of a combined approach to access and analyze data from this resource.

Download Full-text

Data Analysis Strategies for Microbiome Studies in Human Populations—a Systematic Review of Current Practice

mSystems ◽

10.1128/msystems.01154-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Sven Kleine Bardenhorst ◽

Tom Berger ◽

Frank Klawonn ◽

Marius Vital ◽

André Karch ◽

...

Keyword(s):

Data Structure ◽

Data Analysis ◽

Current Practice ◽

Human Microbiome ◽

Rapid Progression ◽

Human Populations ◽

Analysis Strategies ◽

Research Questions ◽

Study Designs ◽

Microbiome Data

ABSTRACT Reproducibility is a major issue in microbiome studies, which is partly caused by missing consensus about data analysis strategies. The complex nature of microbiome data, which are high-dimensional, zero-inflated, and compositional, makes them challenging to analyze, as they often violate assumptions of classic statistical methods. With advances in human microbiome research, research questions and study designs increase in complexity so that more sophisticated data analysis concepts are applied. To improve current practice of the analysis of microbiome studies, it is important to understand what kind of research questions are asked and which tools are used to answer these questions. We conducted a systematic literature review considering all publications focusing on the analysis of human microbiome data from June 2018 to June 2019. Of 1,444 studies screened, 419 fulfilled the inclusion criteria. Information about research questions, study designs, and analysis strategies were extracted. The results confirmed the expected shift to more advanced research questions, as one-third of the studies analyzed clustered data. Although heterogeneity in the methods used was found at any stage of the analysis process, it was largest for differential abundance testing. Especially if the underlying data structure was clustered, we identified a lack of use of methods that appropriately addressed the underlying data structure while taking into account additional dependencies in the data. Our results confirm considerable heterogeneity in analysis strategies among microbiome studies; increasingly complex research questions require better guidance for analysis strategies. IMPORTANCE The human microbiome has emerged as an important factor in the development of health and disease. Growing interest in this topic has led to an increasing number of studies investigating the human microbiome using high-throughput sequencing methods. However, the development of suitable analytical methods for analyzing microbiome data has not kept pace with the rapid progression in the field. It is crucial to understand current practice to identify the scope for development. Our results highlight the need for an extensive evaluation of the strengths and shortcomings of existing methods in order to guide the choice of proper analysis strategies. We have identified where new methods could be designed to address more advanced research questions while taking into account the complex structure of the data.

Download Full-text

Interactive exploratory data analysis of Integrative Human Microbiome Project data using Metaviz

F1000Research ◽

10.12688/f1000research.24345.2 ◽

2021 ◽

Vol 9 ◽

pp. 601

Author(s):

Justin Wagner ◽

Jayaram Kancherla ◽

Domenick Braccia ◽

James Matsumara ◽

Victor Felix ◽

...

Keyword(s):

Data Analysis ◽

Human Microbiome ◽

Disease Status ◽

Human Microbiome Project ◽

Second Phase ◽

Bioconductor Package ◽

The Rich ◽

Rich Data ◽

Project Data ◽

Microbiome Data

The rich data produced by the second phase of the Human Microbiome Project (iHMP) offers a unique opportunity to test hypotheses that interactions between microbial communities and a human host might impact an individual’s health or disease status. In this work we describe infrastructure that integrates Metaviz, an interactive microbiome data analysis and visualization tool, with the iHMP Data Coordination Center web portal and the HMP2Data R/Bioconductor package. We describe integrative statistical and visual analyses of two datasets from iHMP using Metaviz along with the metagenomeSeq R/Bioconductor package for statistical analysis of differential abundance analysis. These use cases demonstrate the utility of a combined approach to access and analyze data from this resource.

Download Full-text

Microbial trend analysis for common dynamic trend, group comparison and classification in longitudinal microbiome study

10.1101/2020.01.30.926824 ◽

2020 ◽

Author(s):

Chan Wang ◽

Jiyuan Hu ◽

Martin J. Blaser ◽

Huilin Li

Keyword(s):

Trend Analysis ◽

Critical Role ◽

Human Microbiome ◽

Real Data ◽

Microbial Dynamics ◽

Individual Subject ◽

Microbial Profiling ◽

Health And Disease ◽

Microbiome Data

AbstractMotivationThe human microbiome is inherently dynamic and its dynamic nature plays a critical role in maintaining health and driving disease. With an increasing number of longitudinal microbiome studies, scientists are eager to learn the comprehensive characterization of microbial dynamics and their implications to the health and disease-related phenotypes. However, due to the challenging structure of longitudinal microbiome data, few analytic methods are available to characterize the microbial dynamics over time.ResultsWe propose a microbial trend analysis (MTA) framework for the high-dimensional and phylogenetically-based longitudinal microbiome data. In particular, MTA can perform three tasks: 1) capture the common microbial dynamic trends for a group of subjects on the community level and identify the dominant taxa; 2) examine whether or not the microbial overall dynamic trends are significantly different in groups; 3) classify an individual subject based on its longitudinal microbial profiling. Our extensive simulations demonstrate that the proposed MTA framework is robust and powerful in hypothesis testing, taxon identification, and subject classification. Our real data analyses further illustrate the utility of MTA through a longitudinal study in mice.ConclusionsThe proposed MTA framework is an attractive and effective tool in investigating dynamic microbial pattern from longitudinal microbiome studies.

Download Full-text

Tree-Aggregated Predictive Modeling of Microbiome Data

10.1101/2020.09.01.277632 ◽

2020 ◽

Author(s):

Jacob Bien ◽

Xiaohan Yan ◽

Léo Simpson ◽

Christian L. Müller

Keyword(s):

Data Analysis ◽

Predictive Modeling ◽

Large Scale ◽

High Throughput Sequencing ◽

Compositional Data ◽

Low Cost ◽

Primary Data ◽

Compositional Data Analysis ◽

Taxonomic Rank ◽

Microbiome Data

AbstractModern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven, parameter-free, and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling making user-defined aggregation obsolete while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human-gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbial ecologists gain insights into the structure and functioning of the underlying ecosystem of interest.

Download Full-text

G2S: A New Deep Learning Tool for Predicting Stool Microbiome Structure From Oral Microbiome Data

Frontiers in Genetics ◽

10.3389/fgene.2021.644516 ◽

2021 ◽

Vol 12 ◽

Author(s):

Simone Rampelli ◽

Marco Fabbrini ◽

Marco Candela ◽

Elena Biagi ◽

Patrizia Brigidi ◽

...

Keyword(s):

Deep Learning ◽

Human Microbiome ◽

Oral Microbiome ◽

Fecal Microbiome ◽

The Family ◽

Bioinformatic Tool ◽

Microbial Metagenomics ◽

Community Theory ◽

Microbiome Data ◽

Fecal Sampling

Deep learning methodologies have revolutionized prediction in many fields and show the potential to do the same in microbial metagenomics. However, deep learning is still unexplored in the field of microbiology, with only a few software designed to work with microbiome data. Within the meta-community theory, we foresee new perspectives for the development and application of deep learning algorithms in the field of the human microbiome. In this context, we developed G2S, a bioinformatic tool for taxonomic prediction of the human fecal microbiome directly from the oral microbiome data of the same individual. The tool uses a deep convolutional neural network trained on paired oral and fecal samples from populations across the globe, which allows inferring the stool microbiome at the family level more accurately than other available approaches. The tool can be used in retrospective studies, where fecal sampling was not performed, and especially in the field of paleomicrobiology, as a unique opportunity to recover data related to ancient gut microbiome configurations. G2S was validated on already characterized oral and fecal sample pairs, and then applied to ancient microbiome data from dental calculi, to derive putative intestinal components in medieval subjects.

Download Full-text

Misclassification of a whole genome sequence reference defined by the Human Microbiome Project: a detrimental carryover effect to microbiome studies

10.1101/19000489 ◽

2019 ◽

Author(s):

DJ Darwin R. Bandoy ◽

B Carol Huang ◽

Bart C. Weimer

Keyword(s):

Human Microbiome ◽

Human Microbiome Project ◽

Outbreak Detection ◽

Whole Genome Sequence ◽

Reference Database ◽

Whole Genome ◽

Reference Species ◽

Genome Sequences ◽

Genome Homology ◽

Microbiome Data

AbstractTaxonomic classification is an essential step in the analysis of microbiome data that depends on a reference database of whole genome sequences. Taxonomic classifiers are built on established reference species, such as the Human Microbiome Project database, that is growing rapidly. While constructing a population wide pangenome of the bacterium Hungatella, we discovered that the Human Microbiome Project reference species Hungatella hathewayi (WAL 18680) was significantly different to other members of this genus. Specifically, the reference lacked the core genome as compared to the other members. Further analysis, using average nucleotide identity (ANI) and 16s rRNA comparisons, indicated that WAL18680 was misclassified as Hungatella. The error in classification is being amplified in the taxonomic classifiers and will have a compounding effect as microbiome analyses are done, resulting in inaccurate assignment of community members and will lead to fallacious conclusions and possibly treatment. As automated genome homology assessment expands for microbiome analysis, outbreak detection, and public health reliance on whole genomes increases this issue will likely occur at an increasing rate. These observations highlight the need for developing reference free methods for epidemiological investigation using whole genome sequences and the criticality of accurate reference databases.

Download Full-text

Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis

Bioinformatics ◽

10.1093/bioinformatics/btx617 ◽

2017 ◽

Vol 34 (8) ◽

pp. 1411-1413 ◽

Cited By ~ 22

Author(s):

Nick Weber ◽

David Liou ◽

Jennifer Dommer ◽

Philip MacMenamin ◽

Mariam Quiñones ◽

...

Keyword(s):

Data Analysis ◽

Cloud Platform ◽

Microbiome Data ◽

Microbiome Data Analysis

Download Full-text

A Bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data

Biostatistics ◽

10.1093/biostatistics/kxz050 ◽

2019 ◽

Author(s):

Shuang Jiang ◽

Guanghua Xiao ◽

Andrew Y Koh ◽

Jiwoong Kim ◽

Qiwei Li ◽

...

Keyword(s):

Regression Model ◽

Negative Binomial ◽

Human Microbiome ◽

Simulated Data ◽

Negative Binomial Regression ◽

Bayesian Regression ◽

Negative Binomial Regression Model ◽

Disease States ◽

Binomial Regression ◽

Microbiome Data

Summary Microbiome omics approaches can reveal intriguing relationships between the human microbiome and certain disease states. Along with identification of specific bacteria taxa associated with diseases, recent scientific advancements provide mounting evidence that metabolism, genetics, and environmental factors can all modulate these microbial effects. However, the current methods for integrating microbiome data and other covariates are severely lacking. Hence, we present an integrative Bayesian zero-inflated negative binomial regression model that can both distinguish differentially abundant taxa with distinct phenotypes and quantify covariate-taxa effects. Our model demonstrates good performance using simulated data. Furthermore, we successfully integrated microbiome taxonomies and metabolomics in two real microbiome datasets to provide biologically interpretable findings. In all, we proposed a novel integrative Bayesian regression model that features bacterial differential abundance analysis and microbiome-covariate effects quantifications, which makes it suitable for general microbiome studies.

Download Full-text

Successful strategies for human microbiome data generation, storage and analyses

Journal of Biosciences ◽

10.1007/s12038-019-9934-y ◽

2019 ◽

Vol 44 (5) ◽

Author(s):

Susan Holmes

Keyword(s):

Human Microbiome ◽

Data Generation ◽

Microbiome Data

Download Full-text