Missing Data Imputation by Nearest-neighbor Trained BP for Fuzzy Clustering

Data missing is a common problem in wireless sensor networks. Currently, to ensure the performance of data processing, making imputation for the missing data is the most common method before getting into sensor data analysis. In this paper, the temporal and spatial nearest neighbor values-based missing data imputation (TSNN), a new imputation based on the temporal and spatial nearest neighbor values has been presented. First, four nearest neighbor values have been defined from the perspective of space and time dimensions as well as the geometrical and data distances, which are the bases of the algorithm that help to exploit the correlations among sensor data on the nodes with the regression tool. Next, the algorithm has been elaborated as well as two parameters, the best number of neighbors and spatial–temporal coefficient. Finally, the algorithm has been tested on an indoor and an outdoor wireless sensor network, and the result shows that TSNN is able to improve the accuracy of imputation and increase the number of cases that can be imputed effectively.

Download Full-text

Integrating WLI fuzzy clustering with grey neural network for missing data imputation

International Journal of Intelligent Enterprise ◽

10.1504/ijie.2017.087011 ◽

2017 ◽

Vol 4 (1/2) ◽

pp. 103 ◽

Cited By ~ 2

Author(s):

Vijayakumar Kuppusamy ◽

Ilango Paramasivam

Keyword(s):

Neural Network ◽

Missing Data ◽

Fuzzy Clustering ◽

Data Imputation ◽

Missing Data Imputation

Download Full-text

Integrating WLI fuzzy clustering with grey neural network for missing data imputation

International Journal of Intelligent Enterprise ◽

10.1504/ijie.2017.10008151 ◽

2017 ◽

Vol 4 (1/2) ◽

pp. 103

Author(s):

Vijayakumar Kuppusamy ◽

Ilango Paramasivam

Keyword(s):

Neural Network ◽

Missing Data ◽

Fuzzy Clustering ◽

Data Imputation ◽

Missing Data Imputation

Download Full-text

Missing data imputation using Evolutionary k- Nearest neighbor algorithm for gene expression data

2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter.2016.7829911 ◽

2016 ◽

Cited By ~ 4

Author(s):

Hiroshi de Silva ◽

A. Shehan Perera

Keyword(s):

Gene Expression ◽

Missing Data ◽

Gene Expression Data ◽

Nearest Neighbor ◽

Expression Data ◽

Data Imputation ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Missing Data Imputation ◽

K Nearest Neighbor Algorithm

Download Full-text

TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach

Bioinformatics ◽

10.1093/bioinformatics/bty796 ◽

2018 ◽

Vol 35 (8) ◽

pp. 1278-1283 ◽

Cited By ~ 6

Author(s):

Xuesi Dong ◽

Lijuan Lin ◽

Ruyang Zhang ◽

Yang Zhao ◽

David C Christiani ◽

...

Keyword(s):

Missing Data ◽

Nearest Neighbor ◽

Data Imputation ◽

K Nearest Neighbor ◽

Missing Data Imputation

Download Full-text

K-Nearest Neighbor (K-NN) based Missing Data Imputation

2019 5th International Conference on Science in Information Technology (ICSITech) ◽

10.1109/icsitech46713.2019.8987530 ◽

2019 ◽

Cited By ~ 1

Author(s):

Della Murbarani Prawidya Murti ◽

Utomo Pujianto ◽

Aji Prasetya Wibawa ◽

Muhammad Iqbal Akbar

Keyword(s):

Missing Data ◽

Nearest Neighbor ◽

Data Imputation ◽

K Nearest Neighbor ◽

Missing Data Imputation

Download Full-text

Missing Data Imputation for Multisite Rainfall Networks: A Comparison between Geostatistical Interpolation and Pattern-Based Estimation on Different Terrain Types

Journal of Hydrometeorology ◽

10.1175/jhm-d-19-0220.1 ◽

2020 ◽

Vol 21 (10) ◽

pp. 2325-2341

Author(s):

Fabio Oriani ◽

Simon Stisen ◽

Mehmet C. Demirel ◽

Gregoire Mariethoz

Keyword(s):

Missing Data ◽

Hydrological Modeling ◽

Nearest Neighbor ◽

Training Dataset ◽

Average Error ◽

Multiple Point ◽

Data Imputation ◽

Missing Data Imputation ◽

Mountainous Regions ◽

Geostatistical Interpolation

AbstractMissing rainfall data are a major limitation for distributed hydrological modeling and climate studies. Practitioners need reliable approaches that can be employed on a daily basis, often with too limited data in space to feed complex predictive models. In this study we compare different automatic approaches for missing data imputation, including geostatistical interpolation and pattern-based estimation algorithms. We introduce two pattern-based approaches based on the analysis of historical data patterns: (i) an iterative version of K-nearest neighbor (IKNN) and (ii) a new algorithm called vector sampling (VS) that combines concepts of multiple-point statistics and resampling. Both algorithms can draw estimations from variably incomplete data patterns, allowing the target dataset to be at the same time the training dataset. Tested on five case studies from Denmark, Australia, and Switzerland, the algorithms show a different performance that seems to be related to the terrain type: on flat terrains with spatially homogeneous rain events, geostatistical interpolation tends to minimize the average error, while in mountainous regions with nonstationary rainfall statistics, data mining can recover better the rainfall patterns. The VS algorithm, requiring minimal parameterization, turns out to be a convenient option for routine application on complex and poorly gauged terrains.

Download Full-text

Missing data imputation using decision trees and fuzzy clustering with iterative learning

Knowledge and Information Systems ◽

10.1007/s10115-019-01427-1 ◽

2019 ◽

Vol 62 (6) ◽

pp. 2419-2437 ◽

Cited By ~ 2

Author(s):

Sanaz Nikfalazar ◽

Chung-Hsing Yeh ◽

Susan Bedingfield ◽

Hadi A. Khorshidi

Keyword(s):

Missing Data ◽

Decision Trees ◽

Fuzzy Clustering ◽

Iterative Learning ◽

Data Imputation ◽

Missing Data Imputation

Download Full-text

Data Imputation Methods for Missing Values in the Context of Clustering

Big Data and Knowledge Sharing in Virtual Organizations - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-7519-1.ch011 ◽

2019 ◽

pp. 240-274

Author(s):

Mehmet S. Aktaş ◽

Sinan Kaplan ◽

Hasan Abacı ◽

Oya Kalipsiz ◽

Utku Ketenci ◽

...

Keyword(s):

Missing Data ◽

Expectation Maximization ◽

Missing Values ◽

Nearest Neighbor ◽

Real Life ◽

Data Imputation ◽

K Nearest Neighbor ◽

Missing Data Imputation ◽

Data Scarcity ◽

Imputation Methods

Missing data is a common problem for data clustering quality. Most real-life datasets have missing data, which in turn has some effect on clustering tasks. This chapter investigates the appropriate data treatment methods for varying missing data scarcity distributions including gamma, Gaussian, and beta distributions. The analyzed data imputation methods include mean, hot-deck, regression, k-nearest neighbor, expectation maximization, and multiple imputation. To reveal the proper methods to deal with missing data, data mining tasks such as clustering is utilized for evaluation. With the experimental studies, this chapter identifies the correlation between missing data imputation methods and missing data distributions for clustering tasks. The results of the experiments indicated that expectation maximization and k-nearest neighbor methods provide best results for varying missing data scarcity distributions.

Download Full-text

A Review of Current Publications Trend on Missing Data Imputation Over Three Decades: Direction and Future Research

10.21203/rs.3.rs-996596/v1 ◽

2021 ◽

Author(s):

farah adibah adnan ◽

Khairur Rijal Jamaludin ◽

Wan Zuki Azman Wan Muhamad ◽

Suraya Miskon

Keyword(s):

Missing Data ◽

Nearest Neighbor ◽

Future Research ◽

Data Imputation ◽

Classification Problems ◽

Missing Data Imputation ◽

Imputation Methods ◽

Publish Or Perish ◽

Research Fields ◽

Scopus Database

Abstract Missing value or sometimes synonym as missing data, is an unavoidable issue when collecting data. It is uncontrollable and happen in almost any research fields. Hence, this study focused on identifying the current publications trend on missing data imputation techniques (1991- 2021) specifically in classification problems using bibliometric analysis. Most importantly, this research aims to uncover the potential missing data imputation methods. Two software were used; VOSViewer and Harzing Publish or Perish. Based on the Scopus database extracted in June 2021, the findings indicate an emerging trend in missing data imputation research to date, while there are two imputation methods that get the most attention; the random forest and the nearest neighbor methods.

Download Full-text