scholarly journals A unified framework for unconstrained and constrained ordination of microbiome read count data

PLoS ONE ◽  
2019 ◽  
Vol 14 (2) ◽  
pp. e0205474 ◽  
Author(s):  
Stijn Hawinkel ◽  
Frederiek-Maarten Kerckhof ◽  
Luc Bijnens ◽  
Olivier Thas
2018 ◽  
Author(s):  
Stijn Hawinkel ◽  
Frederiek-Maarten Kerckhof ◽  
Luc Bijnens ◽  
Olivier Thas

AbstractExplorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method allows easy filtering of technical confounders. As opposed to most existing ordination methods, the assumptions underlying the method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and greatly facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-packageRCM.


2017 ◽  
Vol 18 (1) ◽  
pp. 24-49 ◽  
Author(s):  
Wagner H. Bonat ◽  
Bent Jørgensen ◽  
Célestin C. Kokonendji ◽  
John Hinde ◽  
Clarice G. B. Demétrio

We propose a new class of discrete generalized linear models based on the class of Poisson–Tweedie factorial dispersion models with variance of the form [Formula: see text], where [Formula: see text] is the mean and [Formula: see text] and [Formula: see text] are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for the estimation of the regression and dispersion parameters, respectively. This provides a flexible and efficient regression methodology for a comprehensive family of count models including Hermite, Neyman Type A, Pólya–Aeppli, negative binomial and Poisson-inverse Gaussian. The estimating function approach allows us to extend the Poisson–Tweedie distributions to deal with underdispersed count data by allowing negative values for the dispersion parameter [Formula: see text]. Furthermore, the Poisson–Tweedie family can automatically adapt to highly skewed count data with excessive zeros, without the need to introduce zero-inflated or hurdle components, by the simple estimation of the power parameter. Thus, the proposed models offer a unified framework to deal with under-, equi-, overdispersed, zero-inflated and heavy-tailed count data. The computational implementation of the proposed models is fast, relying only on a simple Newton scoring algorithm. Simulation studies showed that the estimating function approach provides unbiased and consistent estimators for both regression and dispersion parameters. We highlight the ability of the Poisson–Tweedie distributions to deal with count data through a consideration of dispersion, zero-inflated and heavy tail indices, and illustrate its application with four data analyses. We provide an R implementation and the datasets as supplementary materials.


2018 ◽  
Author(s):  
Jukka Intosalmi ◽  
Henrik Mannerström ◽  
Saara Hiltunen ◽  
Harri Lähdesmäki

AbstractMotivationModern single cell RNA sequencing (scRNA-seq) technologies have made it possible to measure the RNA content of individual cells. The scRNA-seq data provide us with detailed information about the cellular states but, despite several pioneering efforts, it remains an open research question how regulatory networks could be inferred from these noisy discrete read count data.ResultsHere, we introduce a hierarchical regression model which is designed for detecting dependencies in scRNA-seq and other count data. We model count data using the Poisson-log normal distribution and, by means of our hierarchical formulation, detect the dependencies between genes using linear regression model for the latent, cell-specific gene expression rate parameters. The hierarchical formulation allows us to model count data without artificial data transformations and makes it possible to incorporate normalization information directly into the latent layer of the model. We test the proposed approach using both simulated and experimental data. Our results show that the proposed approach performs better than standard regression techniques in parameter inference task as well as in variable selection task.AvailabilityAn implementation of the method is available athttps://github.com/jeintos/[email protected],[email protected]


Author(s):  
Tieming Ji ◽  
Jie Chen

AbstractAs one of the most recent advanced technologies developed for biomedical research, the next generation sequencing (NGS) technology has opened more opportunities for scientific discovery of genetic information. The NGS technology is particularly useful in elucidating a genome for the analysis of DNA copy number variants (CNVs). The study of CNVs is important as many genetic studies have led to the conclusion that cancer development, genetic disorders, and other diseases are usually relevant to CNVs on the genome. One way to analyze the NGS data for detecting boundaries of CNV regions on a chromosome or a genome is to phrase the problem as a statistical change point detection problem presented in the read count data. We therefore provide a statistical change point model to help detect CNVs using the NGS read count data. We use a Bayesian approach to incorporate possible parameter changes in the underlying distribution of the NGS read count data. Posterior probabilities for the change point inferences are derived. Extensive simulation studies have shown advantages of our proposed methods. The proposed methods are also applied to a publicly available lung cancer cell line NGS dataset, and CNV regions on this cell line are successfully identified.


2015 ◽  
Vol 34 (9) ◽  
pp. 1577-1589 ◽  
Author(s):  
Hong Zhang ◽  
Jinfeng Xu ◽  
Ning Jiang ◽  
Xiaohua Hu ◽  
Zewei Luo

2012 ◽  
Vol 28 (21) ◽  
pp. 2747-2754 ◽  
Author(s):  
Vincent Plagnol ◽  
James Curtis ◽  
Michael Epstein ◽  
Kin Y. Mok ◽  
Emma Stebbings ◽  
...  

Author(s):  
A. Colin Cameron ◽  
Pravin K. Trivedi

Sign in / Sign up

Export Citation Format

Share Document