scholarly journals Negative Binomial factor regression with application to microbiome data analysis

2021 ◽  
Author(s):  
Christian Lorenz Muller ◽  
Aditya Kumar Mishra

The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host-microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host-associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host-related features and amplicon-derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose Negative Binomial reduced rank regression (NB-RRR) and Negative Binomial co-sparse factor regression (NB-FAR). While NB-RRR encodes the underlying dependency among the microbial abundances as outcomes and the host-associated features as predictors through a rank-constrained coefficient matrix, NB-FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit-rank components of the coefficient matrix sequentially. To solve the non-convex optimization problems associated with these factor regression models, we present a novel iterative block-wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project demonstrate the efficacy of the proposed procedure. In the American Gut Project data, we identify key factors that strongly link dietary habits and host life style to specific microbial families.

Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Lars Snipen ◽  
Inga-Leena Angell ◽  
Torbjørn Rognes ◽  
Knut Rudi

Abstract Background Studies of shifts in microbial community composition has many applications. For studies at species or subspecies levels, the 16S amplicon sequencing lacks resolution and is often replaced by full shotgun sequencing. Due to higher costs, this restricts the number of samples sequenced. As an alternative to a full shotgun sequencing we have investigated the use of Reduced Metagenome Sequencing (RMS) to estimate the composition of a microbial community. This involves the use of double-digested restriction-associated DNA sequencing, which means only a smaller fraction of the genomes are sequenced. The read sets obtained by this approach have properties different from both amplicon and shotgun data, and analysis pipelines for both can either not be used at all or not explore the full potential of RMS data. Results We suggest a procedure for analyzing such data, based on fragment clustering and the use of a constrained ordinary least square de-convolution for estimating the relative abundance of all community members. Mock community datasets show the potential to clearly separate strains even when the 16S is 100% identical, and genome-wide differences is < 0.02, indicating RMS has a very high resolution. From a simulation study, we compare RMS to shotgun sequencing and show that we get improved abundance estimates when the community has many very closely related genomes. From a real dataset of infant guts, we show that RMS is capable of detecting a strain diversity gradient for Escherichia coli across time. Conclusion We find that RMS is a good alternative to either metabarcoding or shotgun sequencing when it comes to resolving microbial communities at the strain level. Like shotgun metagenomics, it requires a good database of reference genomes and is well suited for studies of the human gut or other communities where many reference genomes exist. A data analysis pipeline is offered, as an R package at https://github.com/larssnip/microRMS.


2014 ◽  
Vol 2014 ◽  
pp. 1-6
Author(s):  
Zhijun Luo ◽  
Lirong Wang

A new parallel variable distribution algorithm based on interior point SSLE algorithm is proposed for solving inequality constrained optimization problems under the condition that the constraints are block-separable by the technology of sequential system of linear equation. Each iteration of this algorithm only needs to solve three systems of linear equations with the same coefficient matrix to obtain the descent direction. Furthermore, under certain conditions, the global convergence is achieved.


Author(s):  
Moritz Berger ◽  
Gerhard Tutz

AbstractA flexible semiparametric class of models is introduced that offers an alternative to classical regression models for count data as the Poisson and Negative Binomial model, as well as to more general models accounting for excess zeros that are also based on fixed distributional assumptions. The model allows that the data itself determine the distribution of the response variable, but, in its basic form, uses a parametric term that specifies the effect of explanatory variables. In addition, an extended version is considered, in which the effects of covariates are specified nonparametrically. The proposed model and traditional models are compared in simulations and by utilizing several real data applications from the area of health and social science.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gongchao Jing ◽  
Yufeng Zhang ◽  
Wenzhi Cui ◽  
Lu Liu ◽  
Jian Xu ◽  
...  

Abstract Background Due to their much lower costs in experiment and computation than metagenomic whole-genome sequencing (WGS), 16S rRNA gene amplicons have been widely used for predicting the functional profiles of microbiome, via software tools such as PICRUSt 2. However, due to the potential PCR bias and gene profile variation among phylogenetically related genomes, functional profiles predicted from 16S amplicons may deviate from WGS-derived ones, resulting in misleading results. Results Here we present Meta-Apo, which greatly reduces or even eliminates such deviation, thus deduces much more consistent diversity patterns between the two approaches. Tests of Meta-Apo on > 5000 16S-rRNA amplicon human microbiome samples from 4 body sites showed the deviation between the two strategies is significantly reduced by using only 15 WGS-amplicon training sample pairs. Moreover, Meta-Apo enables cross-platform functional comparison between WGS and amplicon samples, thus greatly improve 16S-based microbiome diagnosis, e.g. accuracy of gingivitis diagnosis via 16S-derived functional profiles was elevated from 65 to 95% by WGS-based classification. Therefore, with the low cost of 16S-amplicon sequencing, Meta-Apo can produce a reliable, high-resolution view of microbiome function equivalent to that offered by shotgun WGS. Conclusions This suggests that large-scale, function-oriented microbiome sequencing projects can probably benefit from the lower cost of 16S-amplicon strategy, without sacrificing the precision in functional reconstruction that otherwise requires WGS. An optimized C++ implementation of Meta-Apo is available on GitHub (https://github.com/qibebt-bioinfo/meta-apo) under a GNU GPL license. It takes the functional profiles of a few paired WGS:16S-amplicon samples as training, and outputs the calibrated functional profiles for the much larger number of 16S-amplicon samples.


Author(s):  
Tamara J. H. M. van Bergen ◽  
Ana B. Rios-Miguel ◽  
Tom M. Nolte ◽  
Ad M. J. Ragas ◽  
Rosalie van Zelm ◽  
...  

Abstract Pharmaceuticals find their way to the aquatic environment via wastewater treatment plants (WWTPs). Biotransformation plays an important role in mitigating environmental risks; however, a mechanistic understanding of involved processes is limited. The aim of this study was to evaluate potential relationships between first-order biotransformation rate constants (kb) of nine pharmaceuticals and initial concentration of the selected compounds, and sampling season of the used activated sludge inocula. Four-day bottle experiments were performed with activated sludge from WWTP Groesbeek (The Netherlands) of two different seasons, summer and winter, spiked with two environmentally relevant concentrations (3 and 30 nM) of pharmaceuticals. Concentrations of the compounds were measured by LC–MS/MS, microbial community composition was assessed by 16S rRNA gene amplicon sequencing, and kb values were calculated. The biodegradable pharmaceuticals were acetaminophen, metformin, metoprolol, terbutaline, and phenazone (ranked from high to low biotransformation rates). Carbamazepine, diatrizoic acid, diclofenac, and fluoxetine were not converted. Summer and winter inocula did not show significant differences in microbial community composition, but resulted in a slightly different kb for some pharmaceuticals. Likely microbial activity was responsible instead of community composition. In the same inoculum, different kb values were measured, depending on initial concentration. In general, biodegradable compounds had a higher kb when the initial concentration was higher. This demonstrates that Michealis-Menten kinetic theory has shortcomings for some pharmaceuticals at low, environmentally relevant concentrations and that the pharmaceutical concentration should be taken into account when measuring the kb in order to reliably predict the fate of pharmaceuticals in the WWTP. Key points • Biotransformation and sorption of pharmaceuticals were assessed in activated sludge. • Higher initial concentrations resulted in higher biotransformation rate constants for biodegradable pharmaceuticals. • Summer and winter inocula produced slightly different biotransformation rate constants although microbial community composition did not significantly change. Graphical abstract


2012 ◽  
Vol 24 (4) ◽  
pp. 1047-1084 ◽  
Author(s):  
Xiao-Tong Yuan ◽  
Shuicheng Yan

We investigate Newton-type optimization methods for solving piecewise linear systems (PLSs) with nondegenerate coefficient matrix. Such systems arise, for example, from the numerical solution of linear complementarity problem, which is useful to model several learning and optimization problems. In this letter, we propose an effective damped Newton method, PLS-DN, to find the exact (up to machine precision) solution of nondegenerate PLSs. PLS-DN exhibits provable semiiterative property, that is, the algorithm converges globally to the exact solution in a finite number of iterations. The rate of convergence is shown to be at least linear before termination. We emphasize the applications of our method in modeling, from a novel perspective of PLSs, some statistical learning problems such as box-constrained least squares, elitist Lasso (Kowalski & Torreesani, 2008 ), and support vector machines (Cortes & Vapnik, 1995 ). Numerical results on synthetic and benchmark data sets are presented to demonstrate the effectiveness and efficiency of PLS-DN on these problems.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A271-A271
Author(s):  
Azizi Seixas ◽  
Nicholas Pantaleo ◽  
Samrachana Adhikari ◽  
Michael Grandner ◽  
Giardin Jean-Louis

Abstract Introduction Causes of COVID-19 burden in urban, suburban, and rural counties are unclear, as early studies provide mixed results implicating high prevalence of pre-existing health risks and chronic diseases. However, poor sleep health that has been linked to infection-based pandemics may provide additional insight for place-based burden. To address this gap, we investigated the relationship between habitual insufficient sleep (sleep &lt;7 hrs./24 hr. period) and COVID-19 cases and deaths across urban, suburban, and rural counties in the US. Methods County-level variables were obtained from the 2014–2018 American community survey five-year estimates and the Center for Disease Control and Prevention. These included percent with insufficient sleep, percent uninsured, percent obese, and social vulnerability index. County level COVID-19 infection and death data through September 12, 2020 were obtained from USA Facts. Cumulative COVID-19 infections and deaths for urban (n=68), suburban (n=740), and rural (n=2331) counties were modeled using separate negative binomial mixed effects regression models with logarithmic link and random state-level intercepts. Zero-inflated models were considered for deaths among suburban and rural counties to account for excess zeros. Results Multivariate regression models indicated positive associations between cumulative COVID-19 infection rates and insufficient sleep in urban, suburban and rural counties. The incidence rate ratio (IRR) for urban counties was 1.03 (95% CI: 1.01 – 1.05), 1.04 (95% CI: 1.02 – 1.05) for suburban, and 1.02 (95% CI: 1.00 – 1.03) rural counties.. Similar positive associations were observed with county-level COVID-19 death rates, IRR = 1.11 (95% CI: 1.07 – 1.16) for urban counties, IRR = 1.04 (95% CI: 1.01 – 1.06) for suburban counties, and IRR = 1.03 (95% CI: 1.01 – 1.05) for rural counties. Level of urbanicity moderated the association between insufficient sleep and COVID deaths, but not for the association between insufficient sleep and COVID infection rates. Conclusion Insufficient sleep was associated with COVID-19 infection cases and mortality rates in urban, suburban and rural counties. Level of urbanicity only moderated the relationship between insufficient sleep and COVID death rates. Future studies should investigate individual-level analysis to understand the role of sleep mitigating COVID-19 infection and death rates. Support (if any) NIH (K07AG052685, R01MD007716, R01HL142066, K01HL135452, R01HL152453


Sign in / Sign up

Export Citation Format

Share Document