Penalized Regression for Multiple Types of Many Features With Missing Data

Statistica Sinica ◽

10.5705/ss.202020.0401 ◽

2023 ◽

Author(s):

Kin Yau Wong ◽

Donglin Zeng ◽

Danyu Lin

Keyword(s):

Missing Data ◽

Penalized Regression

Download Full-text

Multiple imputation with sequential penalized regression

Statistical Methods in Medical Research ◽

10.1177/0962280218755574 ◽

2018 ◽

Vol 28 (5) ◽

pp. 1311-1327 ◽

Author(s):

Faisal M Zahid ◽

Christian Heumann

Keyword(s):

Missing Data ◽

Sample Size ◽

Multiple Imputation ◽

Missing Values ◽

Mean Squared Error ◽

Real Life ◽

Penalized Regression ◽

Parameter Estimates ◽

Squared Error ◽

Software Packages

Missing data is a common issue that can cause problems in estimation and inference in biomedical, epidemiological and social research. Multiple imputation is an increasingly popular approach for handling missing data. In case of a large number of covariates with missing data, existing multiple imputation software packages may not work properly and often produce errors. We propose a multiple imputation algorithm called mispr based on sequential penalized regression models. Each variable with missing values is assumed to have a different distributional form and is imputed with its own imputation model using the ridge penalty. In the case of a large number of predictors with respect to the sample size, the use of a quadratic penalty guarantees unique estimates for the parameters and leads to better predictions than the usual Maximum Likelihood Estimation (MLE), with a good compromise between bias and variance. As a result, the proposed algorithm performs well and provides imputed values that are better even for a large number of covariates with small samples. The results are compared with the existing software packages mice, VIM and Amelia in simulation studies. The missing at random mechanism was the main assumption in the simulation study. The imputation performance of the proposed algorithm is evaluated with mean squared imputation error and mean absolute imputation error. The mean squared error ([Formula: see text]), parameter estimates with their standard errors and confidence intervals are also computed to compare the performance in the regression context. The proposed algorithm is observed to be a good competitor to the existing algorithms, with smaller mean squared imputation error, mean absolute imputation error and mean squared error. The algorithm’s performance becomes considerably better than that of the existing algorithms with increasing number of covariates, especially when the number of predictors is close to or even greater than the sample size. Two real-life datasets are also used to examine the performance of the proposed algorithm using simulations.

Download Full-text

A Monte lo simulation on penalized regression and missing data techniques for social science large-scale data

Korean Society for Educational Evaluation ◽

10.31158/jeev.2019.32.4.755 ◽

2019 ◽

Vol 32 (4) ◽

pp. 755-776

Author(s):

Minjeong Rho ◽

Jin Eun Yoo

Keyword(s):

Missing Data ◽

Social Science ◽

Large Scale ◽

Penalized Regression ◽

Large Scale Data ◽

Missing Data Techniques ◽

Download Full-text

Preventing and Treating Missing Data in Longitudinal Clinical Trials

10.1017/cbo9781139381666 ◽

2013 ◽

Author(s):

Craig Mallinckrodt

Keyword(s):

Clinical Trials ◽

Download Full-text

MISSING DATA

Contemporary Psychology ◽

10.1037/018832 ◽

1979 ◽

Vol 24 (8) ◽

pp. 670-670

Author(s):

FRANZ R. EPTING ◽

ALVIN W. LANDFIELD

Keyword(s):

Download Full-text

MORE MISSING DATA

Contemporary Psychology ◽

10.1037/017889 ◽

1979 ◽

Vol 24 (12) ◽

pp. 1058-1058

Author(s):

AL LANDFIELD ◽

FRANZ EPTING

Keyword(s):

Download Full-text

Estimation of missing data in psychophysiological research: Habituation should not be ignored

PsycEXTRA Dataset ◽

10.1037/e526132012-117 ◽

1996 ◽

Author(s):

John J. Curtin ◽

Christopher J. Patrick

Keyword(s):

Missing Data ◽

Psychophysiological Research ◽

Estimation Of Missing Data

Download Full-text

Impact of missing data in a weight control trial

PsycEXTRA Dataset ◽

10.1037/e546872013-034 ◽

2013 ◽

Author(s):

Samantha Minski ◽

Kristen Medina ◽

Danielle Lespinasse ◽

Stacey Maurer ◽

Manal Alabduljabbar ◽

...

Keyword(s):

Missing Data ◽

Download Full-text

Consumer Outcomes Monthly Update Missing Data Report 5 released

PsycEXTRA Dataset ◽

10.1037/e570752006-004 ◽

2005 ◽

Keyword(s):

Download Full-text

Advances in Multi-level Psychometric Models:Latent Variable Modeling of Growth with Missing Data and Multilevel Data

PsycEXTRA Dataset ◽

10.1037/e670192011-001 ◽

1993 ◽

Author(s):

Bengt O. Muthen

Keyword(s):

Missing Data ◽

Multilevel Data ◽

Download Full-text

Supplemental Material for Model Specification for Nonlinearity and Heterogeneity of Regression in Randomized Pretest Posttest Studies: Practical Solutions for Missing Data

Psychological Methods ◽

10.1037/met0000364.supp ◽

2020 ◽

Keyword(s):

Missing Data ◽

Model Specification

Download Full-text