Non-parametric approach for frequentist multiple imputation in survival analysis with missing covariates

2021 ◽  
pp. 096228022110111
Author(s):  
Yoshinori Takeuchi ◽  
Mitsunori Ogawa ◽  
Yasuhiro Hagiwara ◽  
Yutaka Matsuyama

In clinical and epidemiological studies using survival analysis, some explanatory variables are often missing. When this occurs, multiple imputation (MI) is frequently used in practice. In many cases, simple parametric imputation models are routinely adopted without checking the validity of the model specification. Misspecified imputation models can cause biased parameter estimates. In this study, we describe novel frequentist type MI procedures for survival analysis using proportional and additive hazards models. The procedures are based on non-parametric estimation techniques and do not require the correct specification of parametric imputation models. For continuous missing covariates, we first sample imputation values from a parametric imputation model. Then, we obtain estimates by solving the estimating equation modified by non-parametrically estimated conditional densities. For categorical missing covariates, we directly sample imputation values from a non-parametrically estimated conditional distribution and then obtain estimates by solving the corresponding estimating equation. We evaluate the performance of the proposed procedures using simulation studies: one uses simulated data; another uses data informed by parameters generated from a real-world medical claims database. We also applied the procedures to a pharmacoepidemiological study that examined the effect of antihyperlipidemics on hyperglycemia incidence.

Author(s):  
James Cui

The generalized estimating equation (GEE) approach is a widely used statistical method in the analysis of longitudinal data in clinical and epidemiological studies. It is an extension of the generalized linear model (GLM) method to correlated data such that valid standard errors of the parameter estimates can be drawn. Unlike the GLM method, which is based on the maximum likelihood theory for independent observations, the gee method is based on the quasilikelihood theory and no assumption is made about the distribution of response observations. Therefore, Akaike's information criterion, a widely used method for model selection in glm, is not applicable to gee directly. However, Pan (Biometrics 2001; 57: 120–125) proposed a model-selection method for gee and termed it quasilikelihood under the independence model criterion. This criterion can also be used to select the best-working correlation structure. From Pan's methods, I developed a general Stata program, qic, that accommodates all the distribution and link functions and correlation structures available in Stata version 9. In this paper, I introduce this program and demonstrate how to use it to select the best working correlation structure and the best subset of covariates through two examples in longitudinal studies.


2021 ◽  
pp. 096228022110138
Author(s):  
Danilo Alvares ◽  
Fabio Paredes ◽  
Claudio Vargas ◽  
Catterina Ferreccio

A key hypothesis in epidemiological studies is that time to disease exposure provides relevant information to be considered in statistical models. However, the initiation time of a particular condition is usually unknown. Therefore, we developed a multiple imputation methodology for the age at onset of a particular condition, which is supported by incidence data from different sources of information. We introduced and illustrated such a methodology using simulated data in order to examine the performance of our proposal. Then, we analyzed the association of gallstones and fatty liver disease in the Maule Cohort, a Chilean study of chronic diseases, using participants’ risk factors and six sources of information for the imputation of the age-occurrence of gallstones. Simulated studies showed that an increase in the proportion of imputed data does not affect the quality of the estimated coefficients associated with fully observed variables, while the imputed variable slowly reduces its effect. For the Chilean study, the categorized exposure time to gallstones is a significant variable, in which participants who had short and long exposure have, respectively, 26.2% and 29.1% higher chance of getting a fatty liver disease than non-exposed ones. In conclusion, our multiple imputation approach proved to be quite robust both in the linear/logistic regression simulation studies and in the real application, showing the great potential of this methodology.


2006 ◽  
Vol 25 (20) ◽  
pp. 3503-3517 ◽  
Author(s):  
Chiu-Hsieh Hsu ◽  
Jeremy M. G. Taylor ◽  
Susan Murray ◽  
Daniel Commenges

Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 384
Author(s):  
Rocío Hernández-Sanjaime ◽  
Martín González ◽  
Antonio Peñalver ◽  
Jose J. López-Espín

The presence of unaccounted heterogeneity in simultaneous equation models (SEMs) is frequently problematic in many real-life applications. Under the usual assumption of homogeneity, the model can be seriously misspecified, and it can potentially induce an important bias in the parameter estimates. This paper focuses on SEMs in which data are heterogeneous and tend to form clustering structures in the endogenous-variable dataset. Because the identification of different clusters is not straightforward, a two-step strategy that first forms groups among the endogenous observations and then uses the standard simultaneous equation scheme is provided. Methodologically, the proposed approach is based on a variational Bayes learning algorithm and does not need to be executed for varying numbers of groups in order to identify the one that adequately fits the data. We describe the statistical theory, evaluate the performance of the suggested algorithm by using simulated data, and apply the two-step method to a macroeconomic problem.


2009 ◽  
Vol 20 (2) ◽  
pp. 111-130 ◽  
Author(s):  
Ronaldo Dias ◽  
Nancy L. Garcia ◽  
Angelo Martarelli

Author(s):  
Juliet U. Elu ◽  
Gregory N. Price

AbstractRemittances have been recognized as an important determinant of economic growth for Sub-Saharan African economies as they can finance other determinants that constitute drivers of growth. To the extent that remittances finance terrorism, they can also inhibit economic growth as terrorism can constrain important drivers of growth such as investment and consumption expenditures. In this paper, we appeal to a theory of rational terrorism and consider whether remittances to Sub-Saharan Africa finance terrorism. We estimate the parameters of a static and dynamic terrorism incident supply function with maximum likelihood and Generalized Estimating Equation count data estimators for Sub-Saharan Africa between 1974 and 2006. Our parameter estimates suggest that for Sub-Saharan Africa, remittances are a source of finance for terrorism. We find that approximately one terrorism incident is financed in Sub-Saharan Africa for remittance inflows that range between approximately one quarter of a million dollars and one million dollars.


Sign in / Sign up

Export Citation Format

Share Document