Improved imputation methods for missing data in two-occasion successive sampling

Author(s):  
Garib Nath Singh ◽  
Ashok Kumar Jaiswal ◽  
Awadhesh K. Pandey
2021 ◽  
Author(s):  
M. B. Mohammed ◽  
H. S. Zulkafli ◽  
M. B. Adam ◽  
N. Ali ◽  
I. A. Baba

2021 ◽  
Author(s):  
Rahibu A. Abassi ◽  
Amina S. Msengwa ◽  
Rocky R. J. Akarro

Abstract Background Clinical data are at risk of having missing or incomplete values for several reasons including patients’ failure to attend clinical measurements, wrong interpretations of measurements, and measurement recorder’s defects. Missing data can significantly affect the analysis and results might be doubtful due to bias caused by omission of missed observation during statistical analysis especially if a dataset is considerably small. The objective of this study is to compare several imputation methods in terms of efficiency in filling-in the missing data so as to increase the prediction and classification accuracy in breast cancer dataset. Methods Five imputation methods namely series mean, k-nearest neighbour, hot deck, predictive mean matching, and multiple imputations were applied to replace the missing values to the real breast cancer dataset. The efficiency of imputation methods was compared by using the Root Mean Square Errors and Mean Absolute Errors to obtain a suitable complete dataset. Binary logistic regression and linear discrimination classifiers were applied to the imputed dataset to compare their efficacy on classification and discrimination. Results The evaluation of imputation methods revealed that the predictive mean matching method was better off compared to other imputation methods. In addition, the binary logistic regression and linear discriminant analyses yield almost similar values on overall classification rates, sensitivity and specificity. Conclusion The predictive mean matching imputation showed higher accuracy in estimating and replacing missing/incomplete data values in a real breast cancer dataset under the study. It is a more effective and good method to handle missing data in this scenario. We recommend to replace missing data by using predictive mean matching since it is a plausible approach toward multiple imputations for numerical variables, as it improves estimation and prediction accuracy over the use complete-case analysis especially when percentage of missing data is not very small.


Author(s):  
Rie Toyomoto ◽  
Satoshi Funada ◽  
Toshi A. Furukawa

Sign in / Sign up

Export Citation Format

Share Document