Revisiting Methods For Modeling Longitudinal and Survival Data: The Framingham Heart Study

Abstract Background: Statistical methods for modeling longitudinal and time-to-event data has received much attention in medical research and is becoming increasingly useful. In clinical studies, such as cancer and AIDS, longitudinal biomarkers are used to monitor disease progression and to predict survival. These longitudinal measures are often missing at failure times and may be prone to measurement errors. More importantly, time-dependent survival models that include the raw longitudinal measurements may lead to biased results. In previous studies these two types of data are frequently analyzed separately where a mixed effects model is used for the longitudinal data and a survival model is applied to the event outcome. Methods: In this paper we compare joint maximum likelihood methods, a two-step approach and a time dependent covariate method that link longitudinal data to survival data with emphasis on using longitudinal measures to predict survival. We apply a Bayesian semi-parametric joint method and maximum likelihood joint method that maximizes the joint likelihood of the time-to-event and longitudinal measures. We also implement the Two-Step approach, which estimates random effects separately, and a classic Time Dependent Covariate Model. We use simulation studies to assess bias, accuracy, and coverage probabilities for the estimates of the link parameter that connects the longitudinal measures to survival times. Results: Simulation results demonstrate that the Two-Step approach performed best at estimating the link parameter when variability in the longitudinal measure is low but is somewhat biased downwards when the variability is high. Bayesian semi-parametric and maximum likelihood joint methods yield higher link parameter estimates with low and high variability in the longitudinal measure. The Time Dependent Covariate method resulted in consistent underestimation of the link parameter. We illustrate these methods using data from the Framingham Heart Study in which lipid measurements and Myocardial Infarction data were collected over a period of 26 years.Conclusions: Traditional methods for modeling longitudinal and survival data, such as the time dependent covariate method, that use the observed longitudinal data, tend to provide downwardly biased estimates. The two-step approach and joint models provide better estimates, although a comparison of these methods may depend on the underlying residual variance.

Download Full-text

Revisiting Methods For Modeling Longitudinal and Survival Data: The Framingham Heart Study

10.21203/rs.3.rs-33952/v1 ◽

2020 ◽

Author(s):

Julius S Ngwa ◽

Howard J Cabral ◽

Debbie M Cheng ◽

David R Gagnon ◽

Michael P LaValley ◽

...

Keyword(s):

Maximum Likelihood ◽

Longitudinal Data ◽

Survival Data ◽

Framingham Heart Study ◽

Time Dependent ◽

Time To Event ◽

Longitudinal Measure ◽

Heart Study ◽

Longitudinal And Survival Data ◽

Joint Method

Abstract Background Statistical methods for modeling longitudinal and time-to-event data has received much attention in medical research and is becoming increasingly useful. In clinical studies, such as cancer and AIDS, longitudinal biomarkers are used to monitor disease progression and to predict survival. These longitudinal measures are often missing at failure times and may be prone to measurement errors. More importantly, time-dependent survival models that include the raw longitudinal measurements may lead to biased results. In previous studies these two types of data are frequently analyzed separately where a mixed effects model is used for the longitudinal data and a survival model is applied to the event outcome. Methods In this paper we compare joint maximum likelihood methods, a two-step approach and a time dependent covariate method that link longitudinal data to survival data with emphasis on using longitudinal measures to predict survival. We apply a Bayesian semi-parametric joint method and maximum likelihood joint method that maximizes the joint likelihood of the time-to-event and longitudinal measures. We also implement the Two-Step approach, which estimates random effects separately, and a classic Time Dependent Covariate Model. We use simulation studies to assess bias, accuracy and coverage probabilities for the estimates of the link parameter that connects the longitudinal measures to survival times. Results Simulation results demonstrate that Two-Step approach performed best at estimating the link parameter when variability in the longitudinal measure is low but is somewhat biased downwards when the variability is high. Bayesian semi-parametric and maximum likelihood joint methods yield higher link parameter estimates with low and high variability in the longitudinal measure. Time Dependent Covariate method resulted in consistent underestimation of the link parameter. We illustrate these methods using data from the Framingham Heart Study in which lipid measurements and Myocardial Infarction data were collected over a period of 26 years. Conclusions Traditional methods for modeling longitudinal and survival data, such as time dependent covariate method, that use the observed longitudinal data, tend to provide downward bias estimates. Two-step approach and joint models provide better estimates, although a comparison of these methods may depend on the underlying residual variance.

Download Full-text

Revisiting methods for modeling longitudinal and survival data: Framingham Heart Study

BMC Medical Research Methodology ◽

10.1186/s12874-021-01207-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Julius S. Ngwa ◽

Howard J. Cabral ◽

Debbie M. Cheng ◽

David R. Gagnon ◽

Michael P. LaValley ◽

...

Keyword(s):

Maximum Likelihood ◽

Longitudinal Data ◽

Survival Data ◽

Framingham Heart Study ◽

Time Dependent ◽

Time To Event ◽

Longitudinal Measure ◽

Heart Study ◽

Longitudinal And Survival Data ◽

Joint Method

Abstract Background Statistical methods for modeling longitudinal and time-to-event data has received much attention in medical research and is becoming increasingly useful. In clinical studies, such as cancer and AIDS, longitudinal biomarkers are used to monitor disease progression and to predict survival. These longitudinal measures are often missing at failure times and may be prone to measurement errors. More importantly, time-dependent survival models that include the raw longitudinal measurements may lead to biased results. In previous studies these two types of data are frequently analyzed separately where a mixed effects model is used for the longitudinal data and a survival model is applied to the event outcome. Methods In this paper we compare joint maximum likelihood methods, a two-step approach and a time dependent covariate method that link longitudinal data to survival data with emphasis on using longitudinal measures to predict survival. We apply a Bayesian semi-parametric joint method and maximum likelihood joint method that maximizes the joint likelihood of the time-to-event and longitudinal measures. We also implement the Two-Step approach, which estimates random effects separately, and a classic Time Dependent Covariate Model. We use simulation studies to assess bias, accuracy, and coverage probabilities for the estimates of the link parameter that connects the longitudinal measures to survival times. Results Simulation results demonstrate that the Two-Step approach performed best at estimating the link parameter when variability in the longitudinal measure is low but is somewhat biased downwards when the variability is high. Bayesian semi-parametric and maximum likelihood joint methods yield higher link parameter estimates with low and high variability in the longitudinal measure. The Time Dependent Covariate method resulted in consistent underestimation of the link parameter. We illustrate these methods using data from the Framingham Heart Study in which lipid measurements and Myocardial Infarction data were collected over a period of 26 years. Conclusions Traditional methods for modeling longitudinal and survival data, such as the time dependent covariate method, that use the observed longitudinal data, tend to provide downwardly biased estimates. The two-step approach and joint models provide better estimates, although a comparison of these methods may depend on the underlying residual variance.

Download Full-text

Revisiting Methods For Modeling Longitudinal and Survival Data: The Framingham Heart Study

10.21203/rs.3.rs-33952/v2 ◽

2020 ◽

Author(s):

Julius S Ngwa ◽

Howard J Cabral ◽

Debbie M Cheng ◽

David R Gagnon ◽

Michael P LaValley ◽

...

Keyword(s):

Maximum Likelihood ◽

Longitudinal Data ◽

Survival Data ◽

Framingham Heart Study ◽

Time Dependent ◽

Time To Event ◽

Longitudinal Measure ◽

Heart Study ◽

Longitudinal And Survival Data ◽

Joint Method

Abstract Background: Statistical methods for modeling longitudinal and time-to-event data has received much attention in medical research and is becoming increasingly useful. In clinical studies, such as cancer and AIDS, longitudinal biomarkers are used to monitor disease progression and to predict survival. These longitudinal measures are often missing at failure times and may be prone to measurement errors. More importantly, time-dependent survival models that include the raw longitudinal measurements may lead to biased results. In previous studies these two types of data are frequently analyzed separately where a mixed effects model is used for the longitudinal data and a survival model is applied to the event outcome. Methods: In this paper we compare joint maximum likelihood methods, a two-step approach and a time dependent covariate method that link longitudinal data to survival data with emphasis on using longitudinal measures to predict survival. We apply a Bayesian semi-parametric joint method and maximum likelihood joint method that maximizes the joint likelihood of the time-to-event and longitudinal measures. We also implement the Two-Step approach, which estimates random effects separately, and a classic Time Dependent Covariate Model. We use simulation studies to assess bias, accuracy and coverage probabilities for the estimates of the link parameter that connects the longitudinal measures to survival times. Results: Simulation results demonstrate that the Two-Step approach performed best at estimating the link parameter when variability in the longitudinal measure is low but is somewhat biased downwards when the variability is high. Bayesian semi-parametric and maximum likelihood joint methods yield higher link parameter estimates with low and high variability in the longitudinal measure. The Time Dependent Covariate method resulted in consistent underestimation of the link parameter. We illustrate these methods using data from the Framingham Heart Study in which lipid measurements and Myocardial Infarction data were collected over a period of 26 years.Conclusions: Traditional methods for modeling longitudinal and survival data, such as the time dependent covariate method, that use the observed longitudinal data, tend to provide downwardly biased estimates. The two-step approach and joint models provide better estimates, although a comparison of these methods may depend on the underlying residual variance.

Download Full-text

Causal Inference for a Population of Causally Connected Units

Journal of Causal Inference ◽

10.1515/jci-2013-0002 ◽

2014 ◽

Vol 2 (1) ◽

pp. 13-74 ◽

Cited By ~ 16

Author(s):

Mark J. van der Laan

Keyword(s):

Data Structure ◽

Maximum Likelihood ◽

Longitudinal Data ◽

Maximum Likelihood Estimators ◽

Time Dependent ◽

Convergence Theory ◽

Nuisance Parameters ◽

Target Parameter ◽

Targeted Maximum Likelihood ◽

The Mean

AbstractSuppose that we observe a population of causally connected units. On each unit at each time-point on a grid we observe a set of other units the unit is potentially connected with, and a unit-specific longitudinal data structure consisting of baseline and time-dependent covariates, a time-dependent treatment, and a final outcome of interest. The target quantity of interest is defined as the mean outcome for this group of units if the exposures of the units would be probabilistically assigned according to a known specified mechanism, where the latter is called a stochastic intervention. Causal effects of interest are defined as contrasts of the mean of the unit-specific outcomes under different stochastic interventions one wishes to evaluate. This covers a large range of estimation problems from independent units, independent clusters of units, and a single cluster of units in which each unit has a limited number of connections to other units. The allowed dependence includes treatment allocation in response to data on multiple units and so called causal interference as special cases. We present a few motivating classes of examples, propose a structural causal model, define the desired causal quantities, address the identification of these quantities from the observed data, and define maximum likelihood based estimators based on cross-validation. In particular, we present maximum likelihood based super-learning for this network data. Nonetheless, such smoothed/regularized maximum likelihood estimators are not targeted and will thereby be overly bias w.r.t. the target parameter, and, as a consequence, generally not result in asymptotically normally distributed estimators of the statistical target parameter.To formally develop estimation theory, we focus on the simpler case in which the longitudinal data structure is a point-treatment data structure. We formulate a novel targeted maximum likelihood estimator of this estimand and show that the double robustness of the efficient influence curve implies that the bias of the targeted minimum loss-based estimation (TMLE) will be a second-order term involving squared differences of two nuisance parameters. In particular, the TMLE will be consistent if either one of these nuisance parameters is consistently estimated. Due to the causal dependencies between units, the data set may correspond with the realization of a single experiment, so that establishing a (e.g. normal) limit distribution for the targeted maximum likelihood estimators, and corresponding statistical inference, is a challenging topic. We prove two formal theorems establishing the asymptotic normality using advances in weak-convergence theory. We conclude with a discussion and refer to an accompanying technical report for extensions to general longitudinal data structures.

Download Full-text

A joint model for mixed and truncated longitudinal data and survival data, with application to HIV vaccine studies

Biostatistics ◽

10.1093/biostatistics/kxx047 ◽

2017 ◽

Vol 19 (3) ◽

pp. 374-390 ◽

Cited By ~ 2

Author(s):

Tingting Yu ◽

Lang Wu ◽

Peter B Gilbert

Keyword(s):

Longitudinal Data ◽

Survival Data ◽

Joint Model ◽

Hiv Vaccine ◽

Likelihood Inference ◽

Likelihood Method ◽

Parameter Estimates ◽

Computationally Efficient ◽

Approximate Likelihood ◽

Longitudinal And Survival Data

SUMMARY In HIV vaccine studies, a major research objective is to identify immune response biomarkers measured longitudinally that may be associated with risk of HIV infection. This objective can be assessed via joint modeling of longitudinal and survival data. Joint models for HIV vaccine data are complicated by the following issues: (i) left truncations of some longitudinal data due to lower limits of quantification; (ii) mixed types of longitudinal variables; (iii) measurement errors and missing values in longitudinal measurements; (iv) computational challenges associated with likelihood inference. In this article, we propose a joint model of complex longitudinal and survival data and a computationally efficient method for approximate likelihood inference to address the foregoing issues simultaneously. In particular, our model does not make unverifiable distributional assumptions for truncated values, which is different from methods commonly used in the literature. The parameters are estimated based on the h-likelihood method, which is computationally efficient and offers approximate likelihood inference. Moreover, we propose a new approach to estimate the standard errors of the h-likelihood based parameter estimates by using an adaptive Gauss–Hermite method. Simulation studies show that our methods perform well and are computationally efficient. A comprehensive data analysis is also presented.

Download Full-text