scholarly journals Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies

2018 ◽  
Vol 35 (3) ◽  
pp. 487-496 ◽  
Author(s):  
Shiquan Sun ◽  
Jiaqiang Zhu ◽  
Sahar Mozaffari ◽  
Carole Ober ◽  
Mengjie Chen ◽  
...  
2018 ◽  
Author(s):  
Shiquan Sun ◽  
Jiaqiang Zhu ◽  
Sahar Mozaffari ◽  
Carole Ober ◽  
Mengjie Chen ◽  
...  

ABSTRACTMotivationGenomic sequencing studies, including RNA sequencing and bisulfite sequencing studies, are becoming increasingly common and increasingly large. Large genomic sequencing studies open doors for accurate molecular trait heritability estimation and powerful differential analysis. Heritability estimation and differential analysis in sequencing studies requires the development of statistical methods that can properly account for the count nature of the sequencing data and that are computationally efficient for large data sets.ResultsHere, we develop such a method, PQLseq (Penalized Quasi-Likelihood for sequencing count data), to enable effective and efficient heritability estimation and differential analysis using the generalized linear mixed model framework. With extensive simulations and comparisons to previous methods, we show that PQLseq is the only method currently available that can produce unbiased heritability estimates for sequencing count data. In addition, we show that PQLseq is well suited for differential analysis in large sequencing studies, providing calibrated type I error control and more power compared to the standard linear mixed model methods. Finally, we apply PQLseq to perform gene expression heritability estimation and differential expression analysis in a large RNA sequencing study in the Hutterites.Availability and implementationPQLseq is implemented as an R package with source code freely available at www.xzlab.org/software.html and https://cran.r-project.org/web/packages/PQLseq/index.html.ContactXZ ([email protected])Supplementary informationSupplementary data are available online.


2019 ◽  
Vol 37 (1) ◽  
pp. 41
Author(s):  
Amanda Marchi MAIORANO ◽  
Thiago Santos MOTA ◽  
Ana Carolina VERDUGO ◽  
Ricardo Antonio da Silva FARIA ◽  
Beatriz Pressi Molina da SILVA ◽  
...  

Comparison of tick resistance in Bos taurus indicus (Nelore) and Bos taurus taurus (Simmental and Caracu) subspecies was investigated utilizing generalized linear mixed models (GLMMs) with Poisson and Negative binomial distributions. Nelore animals (NE) are known to present greater resistance than t. taurus. Difference between tick resistance in Simmental (SI) and Caracu (CA) breeds has never been reported previously. Three artificial tick infestations were conducted to evaluate tick resistance in these breeds. The statistic point of the present study was to show alternative models for the evaluation of tick count data, the GLMMs. Analysis for tick resistance by GLMM with Negative binomial distribution has never been assessed previously. The analyses were performed by the use of the PROC GLIMMIX procedure of the SAS program. The results showed that GLMM with Negative binomial distribution is appropriated to evaluate tick count data with excess of zero observations avoiding overdispersion problems. Finally, considering multiple comparisons with the Bonferroni test, different pattern of tick infestation was observed for the studied breeds, suggesting that NE is the most resistant breed followed by CA.


2021 ◽  
pp. 096228022110175
Author(s):  
Jan P Burgard ◽  
Joscha Krause ◽  
Ralf Münnich ◽  
Domingo Morales

Obesity is considered to be one of the primary health risks in modern industrialized societies. Estimating the evolution of its prevalence over time is an essential element of public health reporting. This requires the application of suitable statistical methods on epidemiologic data with substantial local detail. Generalized linear-mixed models with medical treatment records as covariates mark a powerful combination for this purpose. However, the task is methodologically challenging. Disease frequencies are subject to both regional and temporal heterogeneity. Medical treatment records often show strong internal correlation due to diagnosis-related grouping. This frequently causes excessive variance in model parameter estimation due to rank-deficiency problems. Further, generalized linear-mixed models are often estimated via approximate inference methods as their likelihood functions do not have closed forms. These problems combined lead to unacceptable uncertainty in prevalence estimates over time. We propose an l2-penalized temporal logit-mixed model to solve these issues. We derive empirical best predictors and present a parametric bootstrap to estimate their mean-squared errors. A novel penalized maximum approximate likelihood algorithm for model parameter estimation is stated. With this new methodology, the regional obesity prevalence in Germany from 2009 to 2012 is estimated. We find that the national prevalence ranges between 15 and 16%, with significant regional clustering in eastern Germany.


Sign in / Sign up

Export Citation Format

Share Document