A phylogenetic transform enhances analysis of compositional microbiota data

Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.

Download Full-text

A phylogenetic transform enhances analysis of compositional microbiota data

10.1101/072413 ◽

2016 ◽

Cited By ~ 3

Author(s):

Justin D Silverman ◽

Alex Washburne ◽

Sayan Mukherjee ◽

Lawrence A David

Keyword(s):

Microbial Communities ◽

Relative Abundance ◽

Dominant Role ◽

Phylogenetic Information ◽

Abundance Data ◽

Sequencing Technologies ◽

Environmental Selection ◽

Health And Disease ◽

High Throughput Dna Sequencing ◽

Spurious Results

ABSTRACTHigh-throughput DNA sequencing technologies have revolutionized the study of microbial communities (microbiota) and have revealed their importance in both human health and disease. However, due to technical limitations, data from microbiota surveys reflect the relative abundance of bacterial taxa and not their absolute levels. It is well known that applying common statistical methods, such as correlation or hypothesis testing, to relative abundance data can lead to spurious results. Here, we introduce the PhILR transform, a data transform that utilizes microbial phylogenetic information. This transform enables off-the-shelf statistical tools to be applied to microbiota surveys free from artifacts usually associated with analysis of relative abundance data. Using environmental and human-associated microbial community datasets as benchmarks, we find that the PhILR transform significantly improves the performance of distance-based and machine learning-based statistics, boosting the accuracy of widely used algorithms on reference benchmarks by 90%. Because the PhILR transform relies on bacterial phylogenies, statistics applied in the PhILR coordinate system are also framed within an evolutionary perspective. Regression on PhILR transformed human microbiota data identified evolutionarily neighboring bacterial clades that may have differentiated to adapt to distinct body sites. Variance statistics showed that the degree of covariation of bacterial clades across human body sites tended to increase with phylogenetic relatedness between clades. These findings support the hypothesis that environmental selection, not competition between bacteria, plays a dominant role in structuring human-associated microbial communities.

Download Full-text

Compositional Data Analysis of Periodontal Disease Microbial Communities

Frontiers in Microbiology ◽

10.3389/fmicb.2021.617949 ◽

2021 ◽

Vol 12 ◽

Author(s):

Laura Sisk-Hackworth ◽

Adrian Ortiz-Velez ◽

Micheal B. Reed ◽

Scott T. Kelley

Keyword(s):

Data Analysis ◽

Periodontal Disease ◽

Microbial Communities ◽

Compositional Data ◽

Oral Microbiome ◽

Compositional Data Analysis ◽

Pocket Depth ◽

Metabolomics Data ◽

Standard Data ◽

Log Ratio

Periodontal disease (PD) is a chronic, progressive polymicrobial disease that induces a strong host immune response. Culture-independent methods, such as next-generation sequencing (NGS) of bacteria 16S amplicon and shotgun metagenomic libraries, have greatly expanded our understanding of PD biodiversity, identified novel PD microbial associations, and shown that PD biodiversity increases with pocket depth. NGS studies have also found PD communities to be highly host-specific in terms of both biodiversity and the response of microbial communities to periodontal treatment. As with most microbiome work, the majority of PD microbiome studies use standard data normalization procedures that do not account for the compositional nature of NGS microbiome data. Here, we apply recently developed compositional data analysis (CoDA) approaches and software tools to reanalyze multiomics (16S, metagenomics, and metabolomics) data generated from previously published periodontal disease studies. CoDA methods, such as centered log-ratio (clr) transformation, compensate for the compositional nature of these data, which can not only remove spurious correlations but also allows for the identification of novel associations between microbial features and disease conditions. We validated many of the studies’ original findings, but also identified new features associated with periodontal disease, including the genera Schwartzia and Aerococcus and the cytokine C-reactive protein (CRP). Furthermore, our network analysis revealed a lower connectivity among taxa in deeper periodontal pockets, potentially indicative of a more “random” microbiome. Our findings illustrate the utility of CoDA techniques in multiomics compositional data analysis of the oral microbiome.

Download Full-text

Biogeography & environmental conditions shape bacteriophage-bacteria networks across the human microbiome

10.1101/144642 ◽

2017 ◽

Author(s):

Geoffrey D Hannigan ◽

Melissa B Duhaime ◽

Danai Koutra ◽

Patrick D Schloss

Keyword(s):

Microbial Communities ◽

Human Body ◽

Complex Dynamics ◽

Learning Algorithm ◽

Human Microbiome ◽

Metagenomic Sequence ◽

Future Studies ◽

Disease States ◽

Health And Disease ◽

Extinction Events

AbstractViruses and bacteria are critical components of the human microbiome and play important roles in health and disease. Most previous work has relied on studying bacteria and viruses independently, thereby reducing them to two separate communities. Such approaches are unable to capture how these microbial communities interact, such as through processes that maintain community robustness or allow phage-host populations to co-evolve. We implemented a network-based analytical approach to describe phage-bacteria network diversity throughout the human body. We built these community networks using a machine learning algorithm to predict which phages could infect which bacteria in a given microbiome. Our algorithm was applied to paired viral and bacterial metagenomic sequence sets from three previously published human cohorts. We organized the predicted interactions into networks that allowed us to evaluate phage-bacteria connectedness across the human body. We observed evidence that gut and skin network structures were person-specific and not conserved among cohabitating family members. High-fat diets appeared to be associated with less connected networks. Network structure differed between skin sites, with those exposed to the external environment being less connected and likely more susceptible to network degradation by microbial extinction events. This study quantified and contrasted the diversity of virome-microbiome networks across the human body and illustrated how environmental factors may influence phage-bacteria interactive dynamics. This work provides a baseline for future studies to better understand system perturbations, such as disease states, through ecological networks.Author SummaryThe human microbiome, the collection of microbial communities that colonize the human body, is a crucial component to health and disease. Two major components of the human microbiome are the bacterial and viral communities. These communities have primarily been studied separately using metrics of community composition and diversity. These approaches have failed to capture the complex dynamics of interacting bacteria and phage communities, which frequently share genetic information and work together to maintain ecosystem homestatsis (e.g. kill-the-winner dynamics). Removal of bacteria or phage can disrupt or even collapse those ecosystems. Relationship-based network approaches allow us to capture this interaction information. Using this network-based approach with three independent human cohorts, we were able to present an initial understanding of how phage-bacteria networks differ throughout the human body, so as to provide a baseline for future studies of how and why microbiome networks differ in disease states.

Download Full-text

A gene co-association network regulating gut microbial communities in a Duroc pig population

Microbiome ◽

10.1186/s40168-020-00994-8 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Antonio Reverter ◽

Maria Ballester ◽

Pamela A. Alexandre ◽

Emilio Mármol-Sánchez ◽

Antoni Dalmau ◽

...

Keyword(s):

Microbial Communities ◽

Candidate Genes ◽

Relative Abundance ◽

Association Studies ◽

Single Gene ◽

Host Genome ◽

Genome Wide Association Studies ◽

Vaccine Response ◽

Microbiome Composition ◽

Host Genetic

Abstract Background Analyses of gut microbiome composition in livestock species have shown its potential to contribute to the regulation of complex phenotypes. However, little is known about the host genetic control over the gut microbial communities. In pigs, previous studies are based on classical “single-gene-single-trait” approaches and have evaluated the role of host genome controlling gut prokaryote and eukaryote communities separately. Results In order to determine the ability of the host genome to control the diversity and composition of microbial communities in healthy pigs, we undertook genome-wide association studies (GWAS) for 39 microbial phenotypes that included 2 diversity indexes, and the relative abundance of 31 bacterial and six commensal protist genera in 390 pigs genotyped for 70 K SNPs. The GWAS results were processed through a 3-step analytical pipeline comprised of (1) association weight matrix; (2) regulatory impact factor; and (3) partial correlation and information theory. The inferred gene regulatory network comprised 3561 genes (within a 5 kb distance from a relevant SNP–P < 0.05) and 738,913 connections (SNP-to-SNP co-associations). Our findings highlight the complexity and polygenic nature of the pig gut microbial ecosystem. Prominent within the network were 5 regulators, PRDM15, STAT1, ssc-mir-371, SOX9 and RUNX2 which gathered 942, 607, 588, 284 and 273 connections, respectively. PRDM15 modulates the transcription of upstream regulators of WNT and MAPK-ERK signaling to safeguard naive pluripotency and regulates the production of Th1- and Th2-type immune response. The signal transducer STAT1 has long been associated with immune processes and was recently identified as a potential regulator of vaccine response to porcine reproductive and respiratory syndrome. The list of regulators was enriched for immune-related pathways, and the list of predicted targets includes candidate genes previously reported as associated with microbiota profile in pigs, mice and human, such as SLIT3, SLC39A8, NOS1, IL1R2, DAB1, TOX3, SPP1, THSD7B, ELF2, PIANP, A2ML1, and IFNAR1. Moreover, we show the existence of host-genetic variants jointly associated with the relative abundance of butyrate producer bacteria and host performance. Conclusions Taken together, our results identified regulators, candidate genes, and mechanisms linked with microbiome modulation by the host. They further highlight the value of the proposed analytical pipeline to exploit pleiotropy and the crosstalk between bacteria and protists as significant contributors to host-microbiome interactions and identify genetic markers and candidate genes that can be incorporated in breeding program to improve host-performance and microbial traits.

Download Full-text

Hydrogen sulfide: stench from the past as a mediator of the future

The Biochemist ◽

10.1042/bio03805012 ◽

2016 ◽

Vol 38 (5) ◽

pp. 12-17 ◽

Cited By ~ 3

Author(s):

Jasmina Zivanovic ◽

Milos R. Filipovic

Keyword(s):

Blood Pressure ◽

Hydrogen Sulfide ◽

Human Body ◽

The Past ◽

Ability To Act ◽

Signalling Molecule ◽

Actual Mechanism ◽

Growing Body ◽

Health And Disease ◽

Pharmacological Potential

The past decade has witnessed the discovery of hydrogen sulfide (H2S) as a new signalling molecule. Its ability to act as a neurotransmitter, regulator of blood pressure, immunomodulator or anti-apoptotic agent, together with its great pharmacological potential, is now well established. Notwithstanding the growing body of evidence showing the biological roles of H2S, the gap between these roles and the actual mechanism(s) behind these processes is getting larger. We propose a way that protein cysteine residues can be modified to form protein persulfides (P-SSH) and explain how this process is controlled in a physiologically relevant fashion. This article provides an overview of H2S signalling in the human body with particular emphasis on the latest discoveries regarding the mechanisms of protein persulfidation and depersulfidation, as well as about the biological reactivity of persulfides and their role in health and disease.

Download Full-text

Log-Ratio and Parallel Factor Analysis: An Approach to Analyze Three-Way Compositional Data

Advanced Dynamic Modeling of Economic and Social Systems - Studies in Computational Intelligence ◽

10.1007/978-3-642-32903-6_15 ◽

2013 ◽

pp. 209-221 ◽

Cited By ~ 5

Author(s):

Michele Gallo

Keyword(s):

Factor Analysis ◽

Compositional Data ◽

Parallel Factor Analysis ◽

Parallel Factor ◽

Log Ratio

Download Full-text

Methods for Characterizing Microbial Communities Associated with the Human Body

The Human Microbiota ◽

10.1002/9781118409855.ch2 ◽

2013 ◽

pp. 51-74 ◽

Cited By ~ 3

Author(s):

Christine Bassis ◽

Vincent Young ◽

Thomas Schmidt

Keyword(s):

Microbial Communities ◽

Human Body

Download Full-text

Soil Depth Determines the Composition and Diversity of Bacterial and Archaeal Communities in a Poplar Plantation

Forests ◽

10.3390/f10070550 ◽

2019 ◽

Vol 10 (7) ◽

pp. 550 ◽

Cited By ~ 6

Author(s):

Huili Feng ◽

Jiahuan Guo ◽

Weifeng Wang ◽

Xinzhang Song ◽

Shuiqiang Yu

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Microbial Communities ◽

Relative Abundance ◽

Rrna Gene ◽

Poplar Plantation ◽

Plantation Forest ◽

Bacterial And Archaeal Communities ◽

Different Depths ◽

Archaeal Communities

Understanding the composition and diversity of soil microorganisms that typically mediate the soil biogeochemical cycle is crucial for estimating greenhouse gas flux and mitigating global changes in plantation forests. Therefore, the objectives of this study were to investigate changes in diversity and relative abundance of bacteria and archaea with soil profiles and the potential factors influencing the vertical differentiation of microbial communities in a poplar plantation. We investigated soil bacterial and archaeal community compositions and diversities by 16S rRNA gene Illumina MiSeq sequencing at different depths of a poplar plantation forest in Chenwei forest farm, Sihong County, Jiangsu, China. More than 882,422 quality-filtered 16S rRNA gene sequences were obtained from 15 samples, corresponding to 34 classified phyla and 68 known classes. Ten major bacterial phyla and two archaeal phyla were found. The diversity of bacterial and archaeal communities decreased with depth of the plantation soil. Analysis of variance (ANOVA) of relative abundance of microbial communities exhibited that Nitrospirae, Verrucomicrobia, Latescibacteria, GAL15, SBR1093, and Euryarchaeota had significant differences at different depths. The transition zone of the community composition between the surface and subsurface occurred at 10–20 cm. Overall, our findings highlighted the importance of depth with regard to the complexity and diversity of microbial community composition in plantation forest soils.

Download Full-text

Variable selection in microbiome compositional data analysis

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa029 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 2

Author(s):

Antoni Susin ◽

Yiwen Wang ◽

Kim-Anh Lê Cao ◽

M Luz Calle

Keyword(s):

Data Analysis ◽

Variable Selection ◽

Compositional Data ◽

Penalized Regression ◽

Compositional Data Analysis ◽

Forward Selection ◽

Computationally Efficient ◽

Parsimonious Model ◽

Microbiome Data ◽

Log Ratio

Abstract Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.

Download Full-text

Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa040 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 1

Author(s):

David R Lovell ◽

Xin-Yi Chua ◽

Annette McGrath

Keyword(s):

Count Data ◽

Compositional Data ◽

Compositional Data Analysis ◽

Ratio Analysis ◽

Sequencing Technology ◽

Scale Invariant ◽

Measurement And Analysis ◽

Discrete Nature ◽

The Impact ◽

Log Ratio

Abstract Thanks to sequencing technology, modern molecular bioscience datasets are often compositions of counts, e.g. counts of amplicons, mRNAs, etc. While there is growing appreciation that compositional data need special analysis and interpretation, less well understood is the discrete nature of these count compositions (or, as we call them, lattice compositions) and the impact this has on statistical analysis, particularly log-ratio analysis (LRA) of pairwise association. While LRA methods are scale-invariant, count compositional data are not; consequently, the conclusions we draw from LRA of lattice compositions depend on the scale of counts involved. We know that additive variation affects the relative abundance of small counts more than large counts; here we show that additive (quantization) variation comes from the discrete nature of count data itself, as well as (biological) variation in the system under study and (technical) variation from measurement and analysis processes. Variation due to quantization is inevitable, but its impact on conclusions depends on the underlying scale and distribution of counts. We illustrate the different distributions of real molecular bioscience data from different experimental settings to show why it is vital to understand the distributional characteristics of count data before applying and drawing conclusions from compositional data analysis methods.

Download Full-text