scholarly journals Penalized Regression for Multiple Types of Many Features With Missing Data

2023 ◽  
Author(s):  
Kin Yau Wong ◽  
Donglin Zeng ◽  
Danyu Lin
2018 ◽  
Vol 28 (5) ◽  
pp. 1311-1327 ◽  
Author(s):  
Faisal M Zahid ◽  
Christian Heumann

Missing data is a common issue that can cause problems in estimation and inference in biomedical, epidemiological and social research. Multiple imputation is an increasingly popular approach for handling missing data. In case of a large number of covariates with missing data, existing multiple imputation software packages may not work properly and often produce errors. We propose a multiple imputation algorithm called mispr based on sequential penalized regression models. Each variable with missing values is assumed to have a different distributional form and is imputed with its own imputation model using the ridge penalty. In the case of a large number of predictors with respect to the sample size, the use of a quadratic penalty guarantees unique estimates for the parameters and leads to better predictions than the usual Maximum Likelihood Estimation (MLE), with a good compromise between bias and variance. As a result, the proposed algorithm performs well and provides imputed values that are better even for a large number of covariates with small samples. The results are compared with the existing software packages mice, VIM and Amelia in simulation studies. The missing at random mechanism was the main assumption in the simulation study. The imputation performance of the proposed algorithm is evaluated with mean squared imputation error and mean absolute imputation error. The mean squared error ([Formula: see text]), parameter estimates with their standard errors and confidence intervals are also computed to compare the performance in the regression context. The proposed algorithm is observed to be a good competitor to the existing algorithms, with smaller mean squared imputation error, mean absolute imputation error and mean squared error. The algorithm’s performance becomes considerably better than that of the existing algorithms with increasing number of covariates, especially when the number of predictors is close to or even greater than the sample size. Two real-life datasets are also used to examine the performance of the proposed algorithm using simulations.


1979 ◽  
Vol 24 (8) ◽  
pp. 670-670
Author(s):  
FRANZ R. EPTING ◽  
ALVIN W. LANDFIELD
Keyword(s):  

1979 ◽  
Vol 24 (12) ◽  
pp. 1058-1058
Author(s):  
AL LANDFIELD ◽  
FRANZ EPTING
Keyword(s):  

2013 ◽  
Author(s):  
Samantha Minski ◽  
Kristen Medina ◽  
Danielle Lespinasse ◽  
Stacey Maurer ◽  
Manal Alabduljabbar ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document