Grey Relational Analysis Based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets

Author(s):  
Jianglin Huang ◽  
Hongyi Sun
2018 ◽  
Vol 35 (8) ◽  
pp. 1278-1283 ◽  
Author(s):  
Xuesi Dong ◽  
Lijuan Lin ◽  
Ruyang Zhang ◽  
Yang Zhao ◽  
David C Christiani ◽  
...  

Author(s):  
Mehmet S. Aktaş ◽  
Sinan Kaplan ◽  
Hasan Abacı ◽  
Oya Kalipsiz ◽  
Utku Ketenci ◽  
...  

Missing data is a common problem for data clustering quality. Most real-life datasets have missing data, which in turn has some effect on clustering tasks. This chapter investigates the appropriate data treatment methods for varying missing data scarcity distributions including gamma, Gaussian, and beta distributions. The analyzed data imputation methods include mean, hot-deck, regression, k-nearest neighbor, expectation maximization, and multiple imputation. To reveal the proper methods to deal with missing data, data mining tasks such as clustering is utilized for evaluation. With the experimental studies, this chapter identifies the correlation between missing data imputation methods and missing data distributions for clustering tasks. The results of the experiments indicated that expectation maximization and k-nearest neighbor methods provide best results for varying missing data scarcity distributions.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1782
Author(s):  
Yulong Deng ◽  
Chong Han ◽  
Jian Guo ◽  
Lijuan Sun

Data missing is a common problem in wireless sensor networks. Currently, to ensure the performance of data processing, making imputation for the missing data is the most common method before getting into sensor data analysis. In this paper, the temporal and spatial nearest neighbor values-based missing data imputation (TSNN), a new imputation based on the temporal and spatial nearest neighbor values has been presented. First, four nearest neighbor values have been defined from the perspective of space and time dimensions as well as the geometrical and data distances, which are the bases of the algorithm that help to exploit the correlations among sensor data on the nodes with the regression tool. Next, the algorithm has been elaborated as well as two parameters, the best number of neighbors and spatial–temporal coefficient. Finally, the algorithm has been tested on an indoor and an outdoor wireless sensor network, and the result shows that TSNN is able to improve the accuracy of imputation and increase the number of cases that can be imputed effectively.


2019 ◽  
Vol 6 (8) ◽  
pp. 181860 ◽  
Author(s):  
Qingwei Xu ◽  
Kaili Xu ◽  
Li Li ◽  
Xiwen Yao

Due to a wide range of applications, sand casting occupies an important position in modern casting practice. The main purpose of this study was to optimize the performance parameters of sand casting based on grey relational analysis and predict the missing data using back propagation (BP) neural network. First, the influence of human factors was eliminated by adopting the objective entropy weight method, which also saved manpower. The larger variation degree in the evaluation indicators, indicating that the evaluated projects had good discrimination in this regard, the larger weight should be given to these evaluation indicators. Second, the performance parameters of sand casting were optimized based on grey relational analysis, providing a reference for sand milling. The larger the grey relational degree, the closer the evaluated project was to the ideal project. Third, this paper provided a new method for determining the number of hidden neurons in a network according to the mean square error of training samples, and venting quality was predicted based on BP neural network. The relevant theory was deduced before predicting missing data, such that there will be a general understanding regarding the prediction principle of BP neural network. Fourth, to demonstrate the validity of BP neural network adopted in the process of missing data prediction, grey system theory was applied to compare the result of missing data prediction.


Sign in / Sign up

Export Citation Format

Share Document