Non-parametric approach for frequentist multiple imputation in survival analysis with missing covariates

In clinical and epidemiological studies using survival analysis, some explanatory variables are often missing. When this occurs, multiple imputation (MI) is frequently used in practice. In many cases, simple parametric imputation models are routinely adopted without checking the validity of the model specification. Misspecified imputation models can cause biased parameter estimates. In this study, we describe novel frequentist type MI procedures for survival analysis using proportional and additive hazards models. The procedures are based on non-parametric estimation techniques and do not require the correct specification of parametric imputation models. For continuous missing covariates, we first sample imputation values from a parametric imputation model. Then, we obtain estimates by solving the estimating equation modified by non-parametrically estimated conditional densities. For categorical missing covariates, we directly sample imputation values from a non-parametrically estimated conditional distribution and then obtain estimates by solving the corresponding estimating equation. We evaluate the performance of the proposed procedures using simulation studies: one uses simulated data; another uses data informed by parameters generated from a real-world medical claims database. We also applied the procedures to a pharmacoepidemiological study that examined the effect of antihyperlipidemics on hyperglycemia incidence.

Download Full-text

QIC Program and Model Selection in GEE Analyses

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x0700700205 ◽

2007 ◽

Vol 7 (2) ◽

pp. 209-220 ◽

Cited By ~ 196

Author(s):

James Cui

Keyword(s):

Model Selection ◽

Information Criterion ◽

Estimating Equation ◽

Epidemiological Studies ◽

Correlated Data ◽

Correlation Structure ◽

Parameter Estimates ◽

Link Functions ◽

Independent Observations ◽

Working Correlation

The generalized estimating equation (GEE) approach is a widely used statistical method in the analysis of longitudinal data in clinical and epidemiological studies. It is an extension of the generalized linear model (GLM) method to correlated data such that valid standard errors of the parameter estimates can be drawn. Unlike the GLM method, which is based on the maximum likelihood theory for independent observations, the gee method is based on the quasilikelihood theory and no assumption is made about the distribution of response observations. Therefore, Akaike's information criterion, a widely used method for model selection in glm, is not applicable to gee directly. However, Pan (Biometrics 2001; 57: 120–125) proposed a model-selection method for gee and termed it quasilikelihood under the independence model criterion. This criterion can also be used to select the best-working correlation structure. From Pan's methods, I developed a general Stata program, qic, that accommodates all the distribution and link functions and correlation structures available in Stata version 9. In this paper, I introduce this program and demonstrate how to use it to select the best working correlation structure and the best subset of covariates through two examples in longitudinal studies.

Download Full-text

A strategy to impute age at onset of a particular condition from external sources

Statistical Methods in Medical Research ◽

10.1177/09622802211013830 ◽

2021 ◽

pp. 096228022110138

Author(s):

Danilo Alvares ◽

Fabio Paredes ◽

Claudio Vargas ◽

Catterina Ferreccio

Keyword(s):

Liver Disease ◽

Fatty Liver ◽

Multiple Imputation ◽

Fatty Liver Disease ◽

Age At Onset ◽

Simulated Data ◽

Relevant Information ◽

Epidemiological Studies ◽

Significant Variable ◽

Sources Of Information

A key hypothesis in epidemiological studies is that time to disease exposure provides relevant information to be considered in statistical models. However, the initiation time of a particular condition is usually unknown. Therefore, we developed a multiple imputation methodology for the age at onset of a particular condition, which is supported by incidence data from different sources of information. We introduced and illustrated such a methodology using simulated data in order to examine the performance of our proposal. Then, we analyzed the association of gallstones and fatty liver disease in the Maule Cohort, a Chilean study of chronic diseases, using participants’ risk factors and six sources of information for the imputation of the age-occurrence of gallstones. Simulated studies showed that an increase in the proportion of imputed data does not affect the quality of the estimated coefficients associated with fully observed variables, while the imputed variable slowly reduces its effect. For the Chilean study, the categorized exposure time to gallstones is a significant variable, in which participants who had short and long exposure have, respectively, 26.2% and 29.1% higher chance of getting a fatty liver disease than non-exposed ones. In conclusion, our multiple imputation approach proved to be quite robust both in the linear/logistic regression simulation studies and in the real application, showing the great potential of this methodology.

Download Full-text

Correcting bias due to missing stage data in the non-parametric estimation of stage-specific net survival for colorectal cancer using multiple imputation

Cancer Epidemiology ◽

10.1016/j.canep.2017.02.005 ◽

2017 ◽

Vol 48 ◽

pp. 16-21 ◽

Cited By ~ 5

Author(s):

Milena Falcaro ◽

James R. Carpenter

Keyword(s):

Colorectal Cancer ◽

Multiple Imputation ◽

Parametric Estimation ◽

Net Survival ◽

Non Parametric Estimation ◽

Non Parametric

Download Full-text

Survival analysis using auxiliary variables via non-parametric multiple imputation

Statistics in Medicine ◽

10.1002/sim.2452 ◽

2006 ◽

Vol 25 (20) ◽

pp. 3503-3517 ◽

Cited By ~ 41

Author(s):

Chiu-Hsieh Hsu ◽

Jeremy M. G. Taylor ◽

Susan Murray ◽

Daniel Commenges

Keyword(s):

Survival Analysis ◽

Multiple Imputation ◽

Auxiliary Variables ◽

Non Parametric

Download Full-text

Multiple Imputation Methods for the Missing Covariates in Generalized Estimating Equation

Biometrics ◽

10.2307/2533521 ◽

1997 ◽

Vol 53 (4) ◽

pp. 1538 ◽

Cited By ~ 22

Author(s):

Fang Xie ◽

Myunghee Cho Paik

Keyword(s):

Multiple Imputation ◽

Estimating Equation ◽

Generalized Estimating Equation ◽

Missing Covariates ◽

Imputation Methods ◽

Generalized Estimating

Download Full-text

Breaking correlation breakdowns: non-parametric estimation of downturn correlations and their application in credit risk models

The Journal of Risk Model Validation ◽

10.21314/jrmv.2008.031 ◽

2008 ◽

Vol 2 (4) ◽

pp. 13-26

Author(s):

Oleg Burd

Keyword(s):

Credit Risk ◽

Parametric Estimation ◽

Risk Models ◽

Non Parametric Estimation ◽

Non Parametric

Download Full-text

Estimating Simultaneous Equation Models through an Entropy-Based Incremental Variational Bayes Learning Algorithm

Entropy ◽

10.3390/e23040384 ◽

2021 ◽

Vol 23 (4) ◽

pp. 384

Author(s):

Rocío Hernández-Sanjaime ◽

Martín González ◽

Antonio Peñalver ◽

Jose J. López-Espín

Keyword(s):

Statistical Theory ◽

Learning Algorithm ◽

Real Life ◽

Simulated Data ◽

Simultaneous Equation ◽

Variational Bayes ◽

Parameter Estimates ◽

Step Method ◽

The One ◽

Simultaneous Equation Models

The presence of unaccounted heterogeneity in simultaneous equation models (SEMs) is frequently problematic in many real-life applications. Under the usual assumption of homogeneity, the model can be seriously misspecified, and it can potentially induce an important bias in the parameter estimates. This paper focuses on SEMs in which data are heterogeneous and tend to form clustering structures in the endogenous-variable dataset. Because the identification of different clusters is not straightforward, a two-step strategy that first forms groups among the endogenous observations and then uses the standard simultaneous equation scheme is provided. Methodologically, the proposed approach is based on a variational Bayes learning algorithm and does not need to be executed for varying numbers of groups in order to identify the one that adequately fits the data. We describe the statistical theory, evaluate the performance of the suggested algorithm by using simulated data, and apply the two-step method to a macroeconomic problem.

Download Full-text

Parametric and non-parametric estimation for pairwise-interaction point processes

Probability Theory and Applications ◽

10.1515/9783112314227-010 ◽

1987 ◽

pp. 101-112

Keyword(s):

Point Processes ◽

Pairwise Interaction ◽

Parametric Estimation ◽

Interaction Point ◽

Non Parametric Estimation ◽

Non Parametric

Download Full-text

Non-parametric estimation for aggregated functional data for electric load monitoring

Environmetrics ◽

10.1002/env.914 ◽

2009 ◽

Vol 20 (2) ◽

pp. 111-130 ◽

Cited By ~ 3

Author(s):

Ronaldo Dias ◽

Nancy L. Garcia ◽

Angelo Martarelli

Keyword(s):

Functional Data ◽

Parametric Estimation ◽

Electric Load ◽

Load Monitoring ◽

Non Parametric Estimation ◽

Non Parametric

Download Full-text

Remittances and the Financing of Terrorism In Sub-Saharan Africa: 1974 - 2006

Peace Economics Peace Science and Public Policy ◽

10.1515/1554-8597.1257 ◽

2012 ◽

Vol 18 (1) ◽

Cited By ~ 8

Author(s):

Juliet U. Elu ◽

Gregory N. Price

Keyword(s):

Economic Growth ◽

Count Data ◽

Important Determinant ◽

Estimating Equation ◽

Sub Saharan Africa ◽

Generalized Estimating Equation ◽

Parameter Estimates ◽

Consumption Expenditures ◽

Sub Saharan ◽

Financing Of Terrorism

AbstractRemittances have been recognized as an important determinant of economic growth for Sub-Saharan African economies as they can finance other determinants that constitute drivers of growth. To the extent that remittances finance terrorism, they can also inhibit economic growth as terrorism can constrain important drivers of growth such as investment and consumption expenditures. In this paper, we appeal to a theory of rational terrorism and consider whether remittances to Sub-Saharan Africa finance terrorism. We estimate the parameters of a static and dynamic terrorism incident supply function with maximum likelihood and Generalized Estimating Equation count data estimators for Sub-Saharan Africa between 1974 and 2006. Our parameter estimates suggest that for Sub-Saharan Africa, remittances are a source of finance for terrorism. We find that approximately one terrorism incident is financed in Sub-Saharan Africa for remittance inflows that range between approximately one quarter of a million dollars and one million dollars.

Download Full-text