scholarly journals High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Cristian G. Bologa ◽  
Vernon Shane Pankratz ◽  
Mark L. Unruh ◽  
Maria Eleni Roumelioti ◽  
Vallabh Shah ◽  
...  

Abstract Background Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality. Methods We present a high-performance, direct implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard for multivariate integration Markov Chain Monte Carlo (MCMC). Results Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4–30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD. Conclusions We found that the direct implementation of the h-lik offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of treatment thresholds for treating potassium disorders in the clinic.

2021 ◽  
Vol 12 ◽  
Author(s):  
Roser Bono ◽  
Rafael Alarcón ◽  
María J. Blanca

Generalized linear mixed models (GLMMs) estimate fixed and random effects and are especially useful when the dependent variable is binary, ordinal, count or quantitative but not normally distributed. They are also useful when the dependent variable involves repeated measures, since GLMMs can model autocorrelation. This study aimed to determine how and how often GLMMs are used in psychology and to summarize how the information about them is presented in published articles. Our focus in this respect was mainly on frequentist models. In order to review studies applying GLMMs in psychology we searched the Web of Science for articles published over the period 2014–2018. A total of 316 empirical articles were selected for trend study from 2014 to 2018. We then conducted a systematic review of 118 GLMM analyses from 80 empirical articles indexed in Journal Citation Reports during 2018 in order to evaluate report quality. Results showed that the use of GLMMs increased over time and that 86.4% of articles were published in first- or second-quartile journals. Although GLMMs have, in recent years, been increasingly used in psychology, most of the important information about them was not stated in the majority of articles. Report quality needs to be improved in line with current recommendations for the use of GLMMs.


2021 ◽  
pp. 209-234
Author(s):  
Justin C. Touchon

Mixed effects models are powerful techniques for controlling for non-independence of data or repeated measures, and can be harnessed for both normal and non-normal data structures. Chapter 8 teaches readers how to code, assess, interpret, and troubleshoot both linear and generalized linear mixed models using the same RxP dataset which has been used throughout the book, although now it is viewed through a new lens. Readers are taught how to code likelihood ratio tests to calculate statistical significance and how to use multiple packages, such as lme4 and glmmTMB.


2021 ◽  
Author(s):  
Cristian G. Bologa ◽  
Vernon Shane Pankratz ◽  
Mark L Unruh ◽  
Maria Eleni Roumelioti ◽  
Vallabh Shah ◽  
...  

Abstract Background: Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality. Methods: We present a high-performance implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard of Markov Chain Monte Carlo (MCMC) for multivariate integration. Results: Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4-30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD. Conclusions: We found that the combination of the LA and AD offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of threatment thresholds for treating potassium disorders in the clinic.


2021 ◽  
pp. 096228022110175
Author(s):  
Jan P Burgard ◽  
Joscha Krause ◽  
Ralf Münnich ◽  
Domingo Morales

Obesity is considered to be one of the primary health risks in modern industrialized societies. Estimating the evolution of its prevalence over time is an essential element of public health reporting. This requires the application of suitable statistical methods on epidemiologic data with substantial local detail. Generalized linear-mixed models with medical treatment records as covariates mark a powerful combination for this purpose. However, the task is methodologically challenging. Disease frequencies are subject to both regional and temporal heterogeneity. Medical treatment records often show strong internal correlation due to diagnosis-related grouping. This frequently causes excessive variance in model parameter estimation due to rank-deficiency problems. Further, generalized linear-mixed models are often estimated via approximate inference methods as their likelihood functions do not have closed forms. These problems combined lead to unacceptable uncertainty in prevalence estimates over time. We propose an l2-penalized temporal logit-mixed model to solve these issues. We derive empirical best predictors and present a parametric bootstrap to estimate their mean-squared errors. A novel penalized maximum approximate likelihood algorithm for model parameter estimation is stated. With this new methodology, the regional obesity prevalence in Germany from 2009 to 2012 is estimated. We find that the national prevalence ranges between 15 and 16%, with significant regional clustering in eastern Germany.


Sign in / Sign up

Export Citation Format

Share Document