Missing data imputation using Evolutionary k- Nearest neighbor algorithm for gene expression data

Author(s):  
Hiroshi de Silva ◽  
A. Shehan Perera
2018 ◽  
Vol 35 (8) ◽  
pp. 1278-1283 ◽  
Author(s):  
Xuesi Dong ◽  
Lijuan Lin ◽  
Ruyang Zhang ◽  
Yang Zhao ◽  
David C Christiani ◽  
...  

2020 ◽  
Vol 9 (4) ◽  
pp. 227
Author(s):  
Karshiev Sanjar ◽  
Olimov Bekhzod ◽  
Jaesoo Kim ◽  
Anand Paul ◽  
Jeonghong Kim

Accurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric data, feature correlation to select only the most correlated variables, and a technique to overcome the missing data. Our approach is an effective way to handle missing data in large datasets with the K-nearest neighbor algorithm based on the most correlated features (KNN–MCF). As far as we are concerned, there has been no previous research that has focused on important features dealing with missing observations. Compared to the typical machine learning prediction algorithms, the prediction accuracy of the proposed method is 92.01% with the random forest algorithm, which is more efficient than the other methods.


Author(s):  
Mehmet S. Aktaş ◽  
Sinan Kaplan ◽  
Hasan Abacı ◽  
Oya Kalipsiz ◽  
Utku Ketenci ◽  
...  

Missing data is a common problem for data clustering quality. Most real-life datasets have missing data, which in turn has some effect on clustering tasks. This chapter investigates the appropriate data treatment methods for varying missing data scarcity distributions including gamma, Gaussian, and beta distributions. The analyzed data imputation methods include mean, hot-deck, regression, k-nearest neighbor, expectation maximization, and multiple imputation. To reveal the proper methods to deal with missing data, data mining tasks such as clustering is utilized for evaluation. With the experimental studies, this chapter identifies the correlation between missing data imputation methods and missing data distributions for clustering tasks. The results of the experiments indicated that expectation maximization and k-nearest neighbor methods provide best results for varying missing data scarcity distributions.


Sign in / Sign up

Export Citation Format

Share Document