Missing value imputation method for disaster decision-making using K nearest neighbor

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.

Download Full-text

A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors

Advances in Intelligent Systems and Computing - Intelligent Systems and Applications ◽

10.1007/978-3-030-29516-5_38 ◽

2019 ◽

pp. 486-496

Author(s):

Xin Liu ◽

Xiaochen Lai ◽

Liyong Zhang

Keyword(s):

Nearest Neighbors ◽

Imputation Method ◽

K Nearest Neighbors ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Missing Value Imputation Method Using Separate Features Nearest Neighbors Algorithm

Computational Science – ICCS 2021 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-77967-2_12 ◽

2021 ◽

pp. 128-141

Author(s):

Tomasz Orczyk ◽

Rafał Doroz ◽

Piotr Porwik

Keyword(s):

Nearest Neighbors ◽

Imputation Method ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Machine Learning-Based Missing Value Imputation Method for Clinical Datasets

Lecture Notes in Electrical Engineering - IAENG Transactions on Engineering Technologies ◽

10.1007/978-94-007-6190-2_19 ◽

2013 ◽

pp. 245-257 ◽

Cited By ~ 13

Author(s):

M. Mostafizur Rahman ◽

D. N. Davis

Keyword(s):

Machine Learning ◽

Imputation Method ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Combining kNN Imputation and Bootstrap Calibrated

Exploring Advances in Interdisciplinary Data Mining and Analytics ◽

10.4018/978-1-61350-474-1.ch016 ◽

2011 ◽

pp. 278-289

Author(s):

Yongsong Qin ◽

Shichao Zhang ◽

Chengqi Zhang

Keyword(s):

Confidence Intervals ◽

Incomplete Data ◽

Nearest Neighbor ◽

Simple Procedure ◽

Imputation Method ◽

Important Research ◽

K Nearest Neighbor ◽

Data Discovery ◽

Industrial Data ◽

Simulation Results

The k-nearest neighbor (kNN) imputation, as one of the most important research topics in incomplete data discovery, has been developed with great successes on industrial data. However, it is difficult to obtain a mathematical valid and simple procedure to construct confidence intervals for evaluating the imputed data. This chapter studies a new estimation for missing (or incomplete) data that is a combination of the kNN imputation and bootstrap calibrated EL (Empirical Likelihood). The combination not only releases the burden of seeking a mathematical valid asymptotic theory for the kNN imputation, but also inherits the advantages of the EL method compared to the normal approximation method. Simulation results demonstrate that the bootstrap calibrated EL method performs quite well in estimating confidence intervals for the imputed data with kNN imputation method.

Download Full-text

BiLSTM-A: A missing value imputation method for PM2.5 prediction

10.1109/icaml51583.2020.00014 ◽

2020 ◽

Author(s):

Nan Jiang ◽

Yanan Li ◽

Hua Zuo ◽

Hui Zheng ◽

Qinghe Zheng

Keyword(s):

Imputation Method ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Missing Value Imputation Method by Using Bayesian Network with Weighted Learning

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.299 ◽

2012 ◽

Vol 132 (2) ◽

pp. 299-305

Author(s):

Yoshihiro Miyakoshi ◽

Shohei Kato

Keyword(s):

Bayesian Network ◽

Imputation Method ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Missing Value Imputation Method Based on Clusteringand Nearest Neighbours

International Journal of Future Computer and Communication ◽

10.7763/ijfcc.2012.v1.54 ◽

2012 ◽

pp. 206-208 ◽

Cited By ~ 8

Author(s):

Satish Gajawada ◽

Durga Toshniwal

Keyword(s):

Imputation Method ◽

Missing Value ◽

Missing Value Imputation ◽

Nearest Neighbours

Download Full-text

A Predictive Prescription Using Minimum Volume k-Nearest Neighbor Enclosing Ellipsoid and Robust Optimization

Mathematics ◽

10.3390/math9020119 ◽

2021 ◽

Vol 9 (2) ◽

pp. 119

Author(s):

Shunichi Ohmori

Keyword(s):

Machine Learning ◽

Decision Making ◽

Robust Optimization ◽

Nearest Neighbor ◽

Predictive Analytics ◽

Uncertain Parameters ◽

Minimum Volume ◽

K Nearest Neighbor ◽

Modeling Framework ◽

Prescriptive Analytics

This paper studies the integration of predictive and prescriptive analytics framework for deriving decision from data. Traditionally, in predictive analytics, the purpose is to derive prediction of unknown parameters from data using statistics and machine learning, and in prescriptive analytics, the purpose is to derive a decision from known parameters using optimization technology. These have been studied independently, but the effect of the prediction error in predictive analytics on the decision-making in prescriptive analytics has not been clarified. We propose a modeling framework that integrates machine learning and robust optimization. The proposed algorithm utilizes the k-nearest neighbor model to predict the distribution of uncertain parameters based on the observed auxiliary data. The enclosing minimum volume ellipsoid that contains k-nearest neighbors of is used to form the uncertainty set for the robust optimization formulation. We illustrate the data-driven decision-making framework and our novel robustness notion on a two-stage linear stochastic programming under uncertain parameters. The problem can be reduced to a convex programming, and thus can be solved to optimality very efficiently by the off-the-shelf solvers.

Download Full-text