High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets

Abstract Background Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality. Methods We present a high-performance, direct implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard for multivariate integration Markov Chain Monte Carlo (MCMC). Results Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4–30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD. Conclusions We found that the direct implementation of the h-lik offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of treatment thresholds for treating potassium disorders in the clinic.

Download Full-text

Report Quality of Generalized Linear Mixed Models in Psychology: A Systematic Review

Frontiers in Psychology ◽

10.3389/fpsyg.2021.666182 ◽

2021 ◽

Vol 12 ◽

Author(s):

Roser Bono ◽

Rafael Alarcón ◽

María J. Blanca

Keyword(s):

Systematic Review ◽

Mixed Models ◽

Repeated Measures ◽

Web Of Science ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Journal Citation Reports ◽

Report Quality ◽

Fixed And Random Effects

Generalized linear mixed models (GLMMs) estimate fixed and random effects and are especially useful when the dependent variable is binary, ordinal, count or quantitative but not normally distributed. They are also useful when the dependent variable involves repeated measures, since GLMMs can model autocorrelation. This study aimed to determine how and how often GLMMs are used in psychology and to summarize how the information about them is presented in published articles. Our focus in this respect was mainly on frequentist models. In order to review studies applying GLMMs in psychology we searched the Web of Science for articles published over the period 2014–2018. A total of 316 empirical articles were selected for trend study from 2014 to 2018. We then conducted a systematic review of 118 GLMM analyses from 80 empirical articles indexed in Journal Citation Reports during 2018 in order to evaluate report quality. Results showed that the use of GLMMs increased over time and that 86.4% of articles were published in first- or second-quartile journals. Although GLMMs have, in recent years, been increasingly used in psychology, most of the important information about them was not stated in the majority of articles. Report quality needs to be improved in line with current recommendations for the use of GLMMs.

Download Full-text

Hierarchical likelihood methods for nonlinear and generalized linear mixed models with missing data and measurement errors in covariates

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2012.02.011 ◽

2012 ◽

Vol 109 ◽

pp. 42-51 ◽

Cited By ~ 5

Author(s):

Maengseok Noh ◽

Lang Wu ◽

Youngjo Lee

Keyword(s):

Missing Data ◽

Mixed Models ◽

Measurement Errors ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Likelihood Methods ◽

Hierarchical Likelihood

Download Full-text

Repeated Measures Design with Generalized Linear Mixed Models for Randomized Controlled Trials, by Toshiro Tango

Journal of Biopharmaceutical Statistics ◽

10.1080/10543406.2017.1362625 ◽

2017 ◽

Vol 27 (6) ◽

pp. 1121-1122

Author(s):

Misoo C. Ellison

Keyword(s):

Randomized Controlled Trials ◽

Mixed Models ◽

Repeated Measures ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Controlled Trials ◽

Repeated Measures Design ◽

Randomized Controlled

Download Full-text

Mixed Effects Models

10.1093/oso/9780198869979.003.0008 ◽

2021 ◽

pp. 209-234

Author(s):

Justin C. Touchon

Keyword(s):

Likelihood Ratio ◽

Data Structures ◽

Mixed Models ◽

Repeated Measures ◽

Generalized Linear Mixed Models ◽

Statistical Significance ◽

Linear Mixed Models ◽

Mixed Effects ◽

Likelihood Ratio Tests ◽

Mixed Effects Models

Mixed effects models are powerful techniques for controlling for non-independence of data or repeated measures, and can be harnessed for both normal and non-normal data structures. Chapter 8 teaches readers how to code, assess, interpret, and troubleshoot both linear and generalized linear mixed models using the same RxP dataset which has been used throughout the book, although now it is viewed through a new lens. Readers are taught how to code likelihood ratio tests to calculate statistical significance and how to use multiple packages, such as lme4 and glmmTMB.

Download Full-text

Generalized Mixed Modeling in Massive Electronic Health Record Databases: What is a Healthy Serum Potassium?

10.21203/rs.3.rs-245946/v1 ◽

2021 ◽

Author(s):

Cristian G. Bologa ◽

Vernon Shane Pankratz ◽

Mark L Unruh ◽

Maria Eleni Roumelioti ◽

Vallabh Shah ◽

...

Keyword(s):

Electronic Health Record ◽

Real World ◽

Serum Potassium ◽

Repeated Measures ◽

Linear Models ◽

Care Facility ◽

Health Care Facility ◽

Health Record ◽

Real World Data ◽

Electronic Health

Abstract Background: Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality. Methods: We present a high-performance implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard of Markov Chain Monte Carlo (MCMC) for multivariate integration. Results: Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4-30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD. Conclusions: We found that the combination of the LA and AD offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of threatment thresholds for treating potassium disorders in the clinic.

Download Full-text

Repeated Measures Design with Generalized Linear Mixed Models for Randomized Controlled Trials

10.1201/9781315152097 ◽

2017 ◽

Cited By ~ 6

Author(s):

Toshiro Tango

Keyword(s):

Randomized Controlled Trials ◽

Mixed Models ◽

Repeated Measures ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Controlled Trials ◽

Repeated Measures Design ◽

Randomized Controlled

Download Full-text

Use of Sandwich Variance Estimation in Generalized Linear Mixed Models: for Binary Repeated Measures Data

10.5176/2251-1938_ors16.12 ◽

2016 ◽

Author(s):

A.A. Sunethra ◽

◽

M.R. Sooriyarachchi

Keyword(s):

Mixed Models ◽

Repeated Measures ◽

Variance Estimation ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Repeated Measures Data

Download Full-text

l2-Penalized temporal logit-mixed models for the estimation of regional obesity prevalence over time

Statistical Methods in Medical Research ◽

10.1177/09622802211017583 ◽

2021 ◽

pp. 096228022110175

Author(s):

Jan P Burgard ◽

Joscha Krause ◽

Ralf Münnich ◽

Domingo Morales

Keyword(s):

Parameter Estimation ◽

Medical Treatment ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Obesity Prevalence ◽

Model Parameter ◽

Model Parameter Estimation ◽

Public Health Reporting ◽

Over Time

Obesity is considered to be one of the primary health risks in modern industrialized societies. Estimating the evolution of its prevalence over time is an essential element of public health reporting. This requires the application of suitable statistical methods on epidemiologic data with substantial local detail. Generalized linear-mixed models with medical treatment records as covariates mark a powerful combination for this purpose. However, the task is methodologically challenging. Disease frequencies are subject to both regional and temporal heterogeneity. Medical treatment records often show strong internal correlation due to diagnosis-related grouping. This frequently causes excessive variance in model parameter estimation due to rank-deficiency problems. Further, generalized linear-mixed models are often estimated via approximate inference methods as their likelihood functions do not have closed forms. These problems combined lead to unacceptable uncertainty in prevalence estimates over time. We propose an l2-penalized temporal logit-mixed model to solve these issues. We derive empirical best predictors and present a parametric bootstrap to estimate their mean-squared errors. A novel penalized maximum approximate likelihood algorithm for model parameter estimation is stated. With this new methodology, the regional obesity prevalence in Germany from 2009 to 2012 is estimated. We find that the national prevalence ranges between 15 and 16%, with significant regional clustering in eastern Germany.

Download Full-text