Missing value imputation method for disaster decision-making using K nearest neighbor

2015 ◽  
Vol 43 (4) ◽  
pp. 767-781 ◽  
Author(s):  
Xiaofei Ma ◽  
Qiuyan Zhong
Author(s):  
Caio Ribeiro ◽  
Alex A. Freitas

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.


Author(s):  
Yongsong Qin ◽  
Shichao Zhang ◽  
Chengqi Zhang

The k-nearest neighbor (kNN) imputation, as one of the most important research topics in incomplete data discovery, has been developed with great successes on industrial data. However, it is difficult to obtain a mathematical valid and simple procedure to construct confidence intervals for evaluating the imputed data. This chapter studies a new estimation for missing (or incomplete) data that is a combination of the kNN imputation and bootstrap calibrated EL (Empirical Likelihood). The combination not only releases the burden of seeking a mathematical valid asymptotic theory for the kNN imputation, but also inherits the advantages of the EL method compared to the normal approximation method. Simulation results demonstrate that the bootstrap calibrated EL method performs quite well in estimating confidence intervals for the imputed data with kNN imputation method.


2020 ◽  
Author(s):  
Nan Jiang ◽  
Yanan Li ◽  
Hua Zuo ◽  
Hui Zheng ◽  
Qinghe Zheng

Mathematics ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 119
Author(s):  
Shunichi Ohmori

This paper studies the integration of predictive and prescriptive analytics framework for deriving decision from data. Traditionally, in predictive analytics, the purpose is to derive prediction of unknown parameters from data using statistics and machine learning, and in prescriptive analytics, the purpose is to derive a decision from known parameters using optimization technology. These have been studied independently, but the effect of the prediction error in predictive analytics on the decision-making in prescriptive analytics has not been clarified. We propose a modeling framework that integrates machine learning and robust optimization. The proposed algorithm utilizes the k-nearest neighbor model to predict the distribution of uncertain parameters based on the observed auxiliary data. The enclosing minimum volume ellipsoid that contains k-nearest neighbors of is used to form the uncertainty set for the robust optimization formulation. We illustrate the data-driven decision-making framework and our novel robustness notion on a two-stage linear stochastic programming under uncertain parameters. The problem can be reduced to a convex programming, and thus can be solved to optimality very efficiently by the off-the-shelf solvers.


Sign in / Sign up

Export Citation Format

Share Document