scholarly journals A Latent Allocation Model for the Analysis of Microbial Composition and Disease

2018 ◽  
Author(s):  
Ko Abe ◽  
Masaaki Hirayama ◽  
Kinji Ohno ◽  
Teppei Shimamura

AbstractBackgroundEstablishing the relationship between microbiota and specific disease is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data.ResultsThe purpose of this article is to develop a new probabilistic model, called BERMUDA (BERnoulli and MUltinomial Distribution-based latent Allocation), to address these problems. BERMUDA enables us to describe the differences in bacteria composition and a certain disease among samples. We also provide a simple and efficient learning procedure for the proposed model using an annealing EM algorithm.ConclusionWe illustrate the performance of the proposed method both through both the simulation and real data analysis. BERMUDA is implemented with R and is available from GitHub (https://github.com/abikoushi/Bermuda).

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 1492 ◽  
Author(s):  
Ben J. Callahan ◽  
Kris Sankaran ◽  
Julia A. Fukuyama ◽  
Paul J. McMurdie ◽  
Susan P. Holmes

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.


2020 ◽  
Vol 16 (12) ◽  
pp. e1008473
Author(s):  
Pamela N. Luna ◽  
Jonathan M. Mansbach ◽  
Chad A. Shaw

Changes in the composition of the microbiome over time are associated with myriad human illnesses. Unfortunately, the lack of analytic techniques has hindered researchers’ ability to quantify the association between longitudinal microbial composition and time-to-event outcomes. Prior methodological work developed the joint model for longitudinal and time-to-event data to incorporate time-dependent biomarker covariates into the hazard regression approach to disease outcomes. The original implementation of this joint modeling approach employed a linear mixed effects model to represent the time-dependent covariates. However, when the distribution of the time-dependent covariate is non-Gaussian, as is the case with microbial abundances, researchers require different statistical methodology. We present a joint modeling framework that uses a negative binomial mixed effects model to determine longitudinal taxon abundances. We incorporate these modeled microbial abundances into a hazard function with a parameterization that not only accounts for the proportional nature of microbiome data, but also generates biologically interpretable results. Herein we demonstrate the performance improvements of our approach over existing alternatives via simulation as well as a previously published longitudinal dataset studying the microbiome during pregnancy. The results demonstrate that our joint modeling framework for longitudinal microbiome count data provides a powerful methodology to uncover associations between changes in microbial abundances over time and the onset of disease. This method offers the potential to equip researchers with a deeper understanding of the associations between longitudinal microbial composition changes and disease outcomes. This new approach could potentially lead to new diagnostic biomarkers or inform clinical interventions to help prevent or treat disease.


2020 ◽  
Author(s):  
Chan Wang ◽  
Jiyuan Hu ◽  
Martin J. Blaser ◽  
Huilin Li

AbstractMotivationThe human microbiome is inherently dynamic and its dynamic nature plays a critical role in maintaining health and driving disease. With an increasing number of longitudinal microbiome studies, scientists are eager to learn the comprehensive characterization of microbial dynamics and their implications to the health and disease-related phenotypes. However, due to the challenging structure of longitudinal microbiome data, few analytic methods are available to characterize the microbial dynamics over time.ResultsWe propose a microbial trend analysis (MTA) framework for the high-dimensional and phylogenetically-based longitudinal microbiome data. In particular, MTA can perform three tasks: 1) capture the common microbial dynamic trends for a group of subjects on the community level and identify the dominant taxa; 2) examine whether or not the microbial overall dynamic trends are significantly different in groups; 3) classify an individual subject based on its longitudinal microbial profiling. Our extensive simulations demonstrate that the proposed MTA framework is robust and powerful in hypothesis testing, taxon identification, and subject classification. Our real data analyses further illustrate the utility of MTA through a longitudinal study in mice.ConclusionsThe proposed MTA framework is an attractive and effective tool in investigating dynamic microbial pattern from longitudinal microbiome studies.


Mathematics ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 604 ◽  
Author(s):  
Victor Korolev ◽  
Andrey Gorshenin

Mathematical models are proposed for statistical regularities of maximum daily precipitation within a wet period and total precipitation volume per wet period. The proposed models are based on the generalized negative binomial (GNB) distribution of the duration of a wet period. The GNB distribution is a mixed Poisson distribution, the mixing distribution being generalized gamma (GG). The GNB distribution demonstrates excellent fit with real data of durations of wet periods measured in days. By means of limit theorems for statistics constructed from samples with random sizes having the GNB distribution, asymptotic approximations are proposed for the distributions of maximum daily precipitation volume within a wet period and total precipitation volume for a wet period. It is shown that the exponent power parameter in the mixing GG distribution matches slow global climate trends. The bounds for the accuracy of the proposed approximations are presented. Several tests for daily precipitation, total precipitation volume and precipitation intensities to be abnormally extremal are proposed and compared to the traditional PoT-method. The results of the application of this test to real data are presented.


2020 ◽  
Vol 36 (11) ◽  
pp. 3563-3565
Author(s):  
Li Chen

Abstract Summary Power analysis is essential to decide the sample size of metagenomic sequencing experiments in a case–control study for identifying differentially abundant (DA) microbes. However, the complexity of microbial data characteristics, such as excessive zeros, over-dispersion, compositionality, intrinsically microbial correlations and variable sequencing depths, makes the power analysis particularly challenging because the analytical form is usually unavailable. Here, we develop a simulation-based power assessment strategy and R package powmic, which considers the complexity of microbial data characteristics. A real data example demonstrates the usage of powmic. Availability and implementation powmic R package and online tutorial are available at https://github.com/lichen-lab/powmic. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


mSphere ◽  
2017 ◽  
Vol 2 (6) ◽  
Author(s):  
Xiang Gao ◽  
Huaiying Lin ◽  
Qunfeng Dong

ABSTRACT Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes’ theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.


2015 ◽  
Vol 33 (Suppl. 1) ◽  
pp. 11-16 ◽  
Author(s):  
Philippe Seksik ◽  
Cécilia Landman

The human gut contains 1014 bacteria and many other micro-organisms such as Archaea, viruses and fungi. This gut microbiota has co-evolved with host determinants through symbiotic and co-dependent relationships. Bacteria, which represent 10 times the number of human cells, form the most depicted part of this black box owing to new tools. Re-evaluating the gut microbiota showed how this entity participates in gut physiology and beyond this in human health. Studying and handling this real ‘hidden organ' remains a challenge for clinicians. In this review, we aimed to bring information about gut microbiota, its structure, its roles and the way to capture and measure it. After bacterial colonization in infant, intestinal microbial composition is unique for each individual although more than 95% can be assigned to 4 major phyla. Besides its biodiversity, the major characteristics of gut microbiota are stability over time and resilience after perturbation. In pathological situations, dysbiosis (i.e. imbalance in gut microbiota composition) is observed with a loss in overall diversity. Dysbiosis associated with inflammatory bowel disease was specified with the reduction in biodiversity, the decreased representation of different taxa in the Firmicutes phylum and an increase in Gammaproteobacteria. Beyond depicting gut microbial composition, metagenomics allows the description of the combined genomes of the microorganisms present in the gut, giving access to their potential functions. In fact, each individual overall microbial metagenome outnumbers the size of human genome by a factor of 150. Besides a functional core in which there is redundancy for mandatory functions assuring the robustness of the ecosystem, human gut contains an important diversity and high number of non-redundant bacterial genes. Clinical data, treatment and all the factors able to influence microbiome should enter integrated big data sets to put in light pathways of interplay within the supra organism composed of gut microbiome and host. A better understanding of dynamics within human gut microbiota and microbes-host interaction will allow new insight into gut pathophysiology especially regarding resilience mechanisms and dysbiosis onset and maintenance. This will lead to description of biomarkers of diseases, development of new probiotics/prebiotics and new therapies.


Author(s):  
Josimar Vasconcelos ◽  
Renato Cintra ◽  
Abraão Nascimento

In recent years various probability models have been proposed for describing lifetime data. Increasing model flexibility is often sought as a means to better describe asymmetric and heavy tail distributions. Such extensions were pioneered by the beta-G family. However, efficient goodness-of-fit (GoF) measures for the beta-G distributions are sought. In this paper, we combine probability weighted moments (PWMs) and the Mellin transform (MT) in order to furnish new qualitative and quantitative GoF tools for model selection within the beta-G class. We derive PWMs for the Fr\’{e}chet and Kumaraswamy distributions; and we provide expressions for the MT, and for the log-cumulants (LC) of the beta-Weibull, beta-Fr\’{e}chet, beta-Kumaraswamy, and beta-log-logistic distributions. Subsequently, we construct LC diagrams and, based on the Hotelling’s $T^2$ statistic, we derive confidence ellipses for the LCs. Finally, the proposed GoF measures are applied on five real data sets in order to demonstrate their applicability.


2020 ◽  
Author(s):  
Quy Xuan Cao ◽  
Xinxin Sun ◽  
Karun Rajesh ◽  
Naga Chalasani ◽  
Kayla Gelow ◽  
...  

Abstract Background: Accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Filtering is defined as removing taxa that are present in a small number of samples and have small counts in the samples where they are observed. This approach reduces extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured "mock" datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation, as well as its impact on identifying taxa that discriminate between disease states. Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs, while preserving between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially expressed across groups of samples, and random forest models to rank features with largest contribution towards disease classiffcation. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between filtering and contaminant removal method shows that they have complementary effects and are advised to be used in conjunction. Conclusions: Filtering reduces the complexity of microbiome data, while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis.


Sign in / Sign up

Export Citation Format

Share Document