Missing Data Imputation by Nearest-neighbor Trained BP for Fuzzy Clustering

2014 ◽  
Vol 11 (15) ◽  
pp. 5367-5375 ◽  
Author(s):  
Beilei Wang
Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1782
Author(s):  
Yulong Deng ◽  
Chong Han ◽  
Jian Guo ◽  
Lijuan Sun

Data missing is a common problem in wireless sensor networks. Currently, to ensure the performance of data processing, making imputation for the missing data is the most common method before getting into sensor data analysis. In this paper, the temporal and spatial nearest neighbor values-based missing data imputation (TSNN), a new imputation based on the temporal and spatial nearest neighbor values has been presented. First, four nearest neighbor values have been defined from the perspective of space and time dimensions as well as the geometrical and data distances, which are the bases of the algorithm that help to exploit the correlations among sensor data on the nodes with the regression tool. Next, the algorithm has been elaborated as well as two parameters, the best number of neighbors and spatial–temporal coefficient. Finally, the algorithm has been tested on an indoor and an outdoor wireless sensor network, and the result shows that TSNN is able to improve the accuracy of imputation and increase the number of cases that can be imputed effectively.


2018 ◽  
Vol 35 (8) ◽  
pp. 1278-1283 ◽  
Author(s):  
Xuesi Dong ◽  
Lijuan Lin ◽  
Ruyang Zhang ◽  
Yang Zhao ◽  
David C Christiani ◽  
...  

2020 ◽  
Vol 21 (10) ◽  
pp. 2325-2341
Author(s):  
Fabio Oriani ◽  
Simon Stisen ◽  
Mehmet C. Demirel ◽  
Gregoire Mariethoz

AbstractMissing rainfall data are a major limitation for distributed hydrological modeling and climate studies. Practitioners need reliable approaches that can be employed on a daily basis, often with too limited data in space to feed complex predictive models. In this study we compare different automatic approaches for missing data imputation, including geostatistical interpolation and pattern-based estimation algorithms. We introduce two pattern-based approaches based on the analysis of historical data patterns: (i) an iterative version of K-nearest neighbor (IKNN) and (ii) a new algorithm called vector sampling (VS) that combines concepts of multiple-point statistics and resampling. Both algorithms can draw estimations from variably incomplete data patterns, allowing the target dataset to be at the same time the training dataset. Tested on five case studies from Denmark, Australia, and Switzerland, the algorithms show a different performance that seems to be related to the terrain type: on flat terrains with spatially homogeneous rain events, geostatistical interpolation tends to minimize the average error, while in mountainous regions with nonstationary rainfall statistics, data mining can recover better the rainfall patterns. The VS algorithm, requiring minimal parameterization, turns out to be a convenient option for routine application on complex and poorly gauged terrains.


2019 ◽  
Vol 62 (6) ◽  
pp. 2419-2437 ◽  
Author(s):  
Sanaz Nikfalazar ◽  
Chung-Hsing Yeh ◽  
Susan Bedingfield ◽  
Hadi A. Khorshidi

Author(s):  
Mehmet S. Aktaş ◽  
Sinan Kaplan ◽  
Hasan Abacı ◽  
Oya Kalipsiz ◽  
Utku Ketenci ◽  
...  

Missing data is a common problem for data clustering quality. Most real-life datasets have missing data, which in turn has some effect on clustering tasks. This chapter investigates the appropriate data treatment methods for varying missing data scarcity distributions including gamma, Gaussian, and beta distributions. The analyzed data imputation methods include mean, hot-deck, regression, k-nearest neighbor, expectation maximization, and multiple imputation. To reveal the proper methods to deal with missing data, data mining tasks such as clustering is utilized for evaluation. With the experimental studies, this chapter identifies the correlation between missing data imputation methods and missing data distributions for clustering tasks. The results of the experiments indicated that expectation maximization and k-nearest neighbor methods provide best results for varying missing data scarcity distributions.


2021 ◽  
Author(s):  
farah adibah adnan ◽  
Khairur Rijal Jamaludin ◽  
Wan Zuki Azman Wan Muhamad ◽  
Suraya Miskon

Abstract Missing value or sometimes synonym as missing data, is an unavoidable issue when collecting data. It is uncontrollable and happen in almost any research fields. Hence, this study focused on identifying the current publications trend on missing data imputation techniques (1991- 2021) specifically in classification problems using bibliometric analysis. Most importantly, this research aims to uncover the potential missing data imputation methods. Two software were used; VOSViewer and Harzing Publish or Perish. Based on the Scopus database extracted in June 2021, the findings indicate an emerging trend in missing data imputation research to date, while there are two imputation methods that get the most attention; the random forest and the nearest neighbor methods.


Sign in / Sign up

Export Citation Format

Share Document