Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies

ABSTRACTMotivationGenomic sequencing studies, including RNA sequencing and bisulfite sequencing studies, are becoming increasingly common and increasingly large. Large genomic sequencing studies open doors for accurate molecular trait heritability estimation and powerful differential analysis. Heritability estimation and differential analysis in sequencing studies requires the development of statistical methods that can properly account for the count nature of the sequencing data and that are computationally efficient for large data sets.ResultsHere, we develop such a method, PQLseq (Penalized Quasi-Likelihood for sequencing count data), to enable effective and efficient heritability estimation and differential analysis using the generalized linear mixed model framework. With extensive simulations and comparisons to previous methods, we show that PQLseq is the only method currently available that can produce unbiased heritability estimates for sequencing count data. In addition, we show that PQLseq is well suited for differential analysis in large sequencing studies, providing calibrated type I error control and more power compared to the standard linear mixed model methods. Finally, we apply PQLseq to perform gene expression heritability estimation and differential expression analysis in a large RNA sequencing study in the Hutterites.Availability and implementationPQLseq is implemented as an R package with source code freely available at www.xzlab.org/software.html and https://cran.r-project.org/web/packages/PQLseq/index.html.ContactXZ ([email protected])Supplementary informationSupplementary data are available online.

Download Full-text

Flexible Bayesian Diritchlet Mixtures of Generalized Linear Mixed Models for Count Data

Scientific African ◽

10.1016/j.sciaf.2021.e00963 ◽

2021 ◽

pp. e00963

Author(s):

Olumide S. Adesina ◽

Dawud A. Agunbiade ◽

Pelumi E. Oguntunde

Keyword(s):

Count Data ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models

Download Full-text

A Generalized Concordance Correlation Coefficient Based on the Variance Components Generalized Linear Mixed Models for Overdispersed Count Data

Biometrics ◽

10.1111/j.1541-0420.2009.01335.x ◽

2009 ◽

Vol 66 (3) ◽

pp. 897-904 ◽

Cited By ~ 25

Author(s):

Josep L. Carrasco

Keyword(s):

Correlation Coefficient ◽

Count Data ◽

Mixed Models ◽

Variance Components ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Concordance Correlation Coefficient ◽

Concordance Correlation

Download Full-text

Bayesian Prediction of Spatial Count Data Using Generalized Linear Mixed Models

Biometrics ◽

10.1111/j.0006-341x.2002.00280.x ◽

2002 ◽

Vol 58 (2) ◽

pp. 280-286 ◽

Cited By ~ 89

Author(s):

Ole F. Christensen ◽

Rasmus Waagepetersen

Keyword(s):

Count Data ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Bayesian Prediction ◽

Spatial Count Data

Download Full-text

COMPARATIVE STUDY OF CATTLE TICK RESISTANCE USING GENERALIZED LINEAR MIXED MODELS

REVISTA BRASILEIRA DE BIOMETRIA ◽

10.28951/rbb.v37i1.341 ◽

2019 ◽

Vol 37 (1) ◽

pp. 41

Author(s):

Amanda Marchi MAIORANO ◽

Thiago Santos MOTA ◽

Ana Carolina VERDUGO ◽

Ricardo Antonio da Silva FARIA ◽

Beatriz Pressi Molina da SILVA ◽

...

Keyword(s):

Negative Binomial Distribution ◽

Count Data ◽

Binomial Distribution ◽

Mixed Models ◽

Negative Binomial ◽

Bos Taurus ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Tick Count ◽

Tick Resistance

Comparison of tick resistance in Bos taurus indicus (Nelore) and Bos taurus taurus (Simmental and Caracu) subspecies was investigated utilizing generalized linear mixed models (GLMMs) with Poisson and Negative binomial distributions. Nelore animals (NE) are known to present greater resistance than t. taurus. Difference between tick resistance in Simmental (SI) and Caracu (CA) breeds has never been reported previously. Three artificial tick infestations were conducted to evaluate tick resistance in these breeds. The statistic point of the present study was to show alternative models for the evaluation of tick count data, the GLMMs. Analysis for tick resistance by GLMM with Negative binomial distribution has never been assessed previously. The analyses were performed by the use of the PROC GLIMMIX procedure of the SAS program. The results showed that GLMM with Negative binomial distribution is appropriated to evaluate tick count data with excess of zero observations avoiding overdispersion problems. Finally, considering multiple comparisons with the Bonferroni test, different pattern of tick infestation was observed for the studied breeds, suggesting that NE is the most resistant breed followed by CA.

Download Full-text

l2-Penalized temporal logit-mixed models for the estimation of regional obesity prevalence over time

Statistical Methods in Medical Research ◽

10.1177/09622802211017583 ◽

2021 ◽

pp. 096228022110175

Author(s):

Jan P Burgard ◽

Joscha Krause ◽

Ralf Münnich ◽

Domingo Morales

Keyword(s):

Parameter Estimation ◽

Medical Treatment ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Obesity Prevalence ◽

Model Parameter ◽

Model Parameter Estimation ◽

Public Health Reporting ◽

Over Time

Obesity is considered to be one of the primary health risks in modern industrialized societies. Estimating the evolution of its prevalence over time is an essential element of public health reporting. This requires the application of suitable statistical methods on epidemiologic data with substantial local detail. Generalized linear-mixed models with medical treatment records as covariates mark a powerful combination for this purpose. However, the task is methodologically challenging. Disease frequencies are subject to both regional and temporal heterogeneity. Medical treatment records often show strong internal correlation due to diagnosis-related grouping. This frequently causes excessive variance in model parameter estimation due to rank-deficiency problems. Further, generalized linear-mixed models are often estimated via approximate inference methods as their likelihood functions do not have closed forms. These problems combined lead to unacceptable uncertainty in prevalence estimates over time. We propose an l2-penalized temporal logit-mixed model to solve these issues. We derive empirical best predictors and present a parametric bootstrap to estimate their mean-squared errors. A novel penalized maximum approximate likelihood algorithm for model parameter estimation is stated. With this new methodology, the regional obesity prevalence in Germany from 2009 to 2012 is estimated. We find that the national prevalence ranges between 15 and 16%, with significant regional clustering in eastern Germany.

Download Full-text