Optimization of Missing Value Data Imputation Automatic Dependent Surveillance Broadcasting (ADS-B) Based on K-Nearest Neighbor and Genetic Algorithm

The flight navigation equipments technology use still conventional, namely using radar, now slowly starting to switch to Automatic Dependent Surveillance-Broadcast (ADS-B [6]. In this study, using RTL-SDR to detect aircraft and carry out tests through the Monte Carlo alltitude method, latitude, and longitude only [3]. However, in this system there is a problem regarding the missing value in the preprocessed data results / ADS-B flow data. In handling missing values, the KNN method is the most popular, but the weakness in the KNN method, can reduce the performance[9]. So a Genetic Algorithm (GA) is proposed to optimize the k value in the KNN method. The results of this study obtained a better MSE value in the imputation process. Altitude k = 3, with MSE 128668.96, Speed k = 6, with the MSE value = 457.5201, while the k value in the Heading variable k = 61 with MSE = 752.1429. For Lattitude and Longitude, the value of k = 3, MSE 9.16E-05 and k = 2 and MSE 1.68E-05.

Download Full-text

Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour

Scientific Reports ◽

10.1038/s41598-021-03438-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Aditya Dubey ◽

Akhtar Rasool

Keyword(s):

Gene Expression ◽

Missing Data ◽

Spectral Clustering ◽

Missing Values ◽

Nearest Neighbor ◽

Local Similarity ◽

K Nearest Neighbor ◽

Microarray Gene Expression ◽

Missing Value ◽

Hardware Failure

AbstractFor most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept of weighted distance. The evaluation is carried out on numerous datasets from a variety of biological areas, with experimentally inserted missing values varying from 5 to 25%. Experimental results prove that the proposed imputation technique makes accurate predictions as compared to other imputation procedures. In this paper, for performing the imputation experiments, microarray gene expression datasets consisting of information of different cancers and tumors are considered. The main contribution of this research states that local similarity-based techniques can be used for imputation even when the dataset has varying dimensionality and characteristics.

Download Full-text

Perbaikan Missing value Menggunakan Pendekatan Korelasi Pada Metode K-Nearest Neighbor

JURNAL INFOTEL ◽

10.20895/infotel.v9i3.286 ◽

2017 ◽

Vol 9 (3) ◽

Author(s):

Novta Dany'el Irawan ◽

Wijono Wijono ◽

Onny Setyawati

Keyword(s):

Social Science ◽

Natural Science ◽

Nearest Neighbor ◽

Classification Method ◽

K Nearest Neighbor ◽

Missing Value ◽

K Value ◽

Data Value ◽

Science Major ◽

Measuring Tool

Missing value often occur in classification method that is caused by information on the object is not given, it is difficult to find, or because of the information is unavailable. It will cause the decrement of accuracy and data quality during it is analyzed. Correlation approach was conducted because it should be known the existence and the strength of variable correlation in related to an object or subject studied. Classification method used is K-NN method. It is because this method is included in classification method that has strong consistency by finding the case through calculation on the closeness between the case with the old one based on K value or the nearest neighbor. Correlation approach can be done to overcome missing value, as evidenced by the increasing classification results and the loss of unclassified data. Questionnaire as a measuring tool, the questionnaire contains some questions given to the respondent, from the results of questionnaires conducted data analysis to determine the level of correlation of data backup. After getting the level of backup data correlation, then the backup data is used as a substitute for missing data value. Before the replacement of data there is missing value classification of 500 data classified natural science major 88 students, social science major 126 students, the language major 271 students, and unclassified / false 15 students. After the replacement of data there is missing value from 500 data, it can be classified into natural science major 102 students, social science major 316 students, the language major 82 students, and no unclassified data. Based on the experimental results, the value of k = 3, 5, 7, 9, and 11. It can be seen that k = 5 has a high accuracy of 97.0%, so in this study majors using K-NN method set k value used is 5.

Download Full-text

Data Imputation Methods for Missing Values in the Context of Clustering

Big Data and Knowledge Sharing in Virtual Organizations - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-7519-1.ch011 ◽

2019 ◽

pp. 240-274

Author(s):

Mehmet S. Aktaş ◽

Sinan Kaplan ◽

Hasan Abacı ◽

Oya Kalipsiz ◽

Utku Ketenci ◽

...

Keyword(s):

Missing Data ◽

Expectation Maximization ◽

Missing Values ◽

Nearest Neighbor ◽

Real Life ◽

Data Imputation ◽

K Nearest Neighbor ◽

Missing Data Imputation ◽

Data Scarcity ◽

Imputation Methods

Missing data is a common problem for data clustering quality. Most real-life datasets have missing data, which in turn has some effect on clustering tasks. This chapter investigates the appropriate data treatment methods for varying missing data scarcity distributions including gamma, Gaussian, and beta distributions. The analyzed data imputation methods include mean, hot-deck, regression, k-nearest neighbor, expectation maximization, and multiple imputation. To reveal the proper methods to deal with missing data, data mining tasks such as clustering is utilized for evaluation. With the experimental studies, this chapter identifies the correlation between missing data imputation methods and missing data distributions for clustering tasks. The results of the experiments indicated that expectation maximization and k-nearest neighbor methods provide best results for varying missing data scarcity distributions.

Download Full-text

Kernel weighted least square approach for imputing missing values of metabolomics data

Scientific Reports ◽

10.1038/s41598-021-90654-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nishith Kumar ◽

Md. Aminul Hoque ◽

Masahiro Sugimoto

Keyword(s):

Missing Data ◽

Large Scale ◽

Missing Values ◽

Kernel Weight ◽

Least Square ◽

Data Matrix ◽

Data Imputation ◽

Metabolomics Data ◽

Missing Value ◽

Missing Data Imputation

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.

Download Full-text

Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines

International Journal of Neural Systems ◽

10.1142/s0129065797000318 ◽

1997 ◽

Vol 08 (03) ◽

pp. 301-315 ◽

Cited By ~ 8

Author(s):

Marcel J. Nijman ◽

Hilbert J. Kappen

Keyword(s):

Symmetry Breaking ◽

Incomplete Data ◽

Missing Values ◽

Nearest Neighbor ◽

Boltzmann Machine ◽

K Nearest Neighbor ◽

Data Set ◽

Input Space ◽

Learning Rules ◽

Radial Basis

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.

Download Full-text

Multiple Regression and K-Nearest-Neighbor Based Algorithm for Estimating Missing Values within Sensor

10.1109/icnisc54316.2021.00116 ◽

2021 ◽

Author(s):

Xiantong Li ◽

Yuan Sui

Keyword(s):

Multiple Regression ◽

Missing Values ◽

Nearest Neighbor ◽

K Nearest Neighbor

Download Full-text

Analysis of gabor filter based features with PCA and GA for the detection of drusen in fundus images

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.8969 ◽

2018 ◽

Vol 7 (1) ◽

pp. 115

Author(s):

Sheela N. ◽

Basavaraj L.

Keyword(s):

Genetic Algorithm ◽

Nearest Neighbor ◽

Gabor Filter ◽

Age Related Macular Degeneration ◽

Misclassification Rate ◽

Support Vector ◽

K Nearest Neighbor ◽

Automated Method ◽

Age Related ◽

Predictive Rate

Human eye can be affected by different types of diseases. Age-Related Macular Degeneration (AMD) is one of the such diseases, and it mainly occurs after 50 years of age. This disease is characterized by the occurrence of yellow spots called as Drusen. In this work, an automated method for the detection of drusen in Fundus image has been developed, and it has been tested on 70 images consisting of 30 normal images and 40 images with drusen. Performance of the Support Vector Machine (SVM) and K Nearest Neighbor (KNN) classifier has been evaluated using Data's reduction using Principle Component Analysis (PCA) and Data's selection using Genetic Algorithm (GA).Performance evaluation has been done in terms of accuracy, sensitivity, specificity, misclassification rate, positive predictive rate, negative predictive rate and Youden’s Index. The proposed method has achieved highest accuracy of 98.7% when data selection using Genetic Algorithm has been applied.

Download Full-text

The K Nearest Neighbor Algorithm for Imputation of Missing Longitudinal Prenatal Alcohol Data

10.21203/rs.3.rs-32456/v2 ◽

2021 ◽

Author(s):

Ayesha Sania ◽

Nicolo Pini ◽

Morgan Nelson ◽

Michael Myers ◽

Lauren Shuffrey ◽

...

Keyword(s):

Missing Data ◽

Missing Values ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Drinking Behavior ◽

Nearest Neighbors ◽

First Trimester ◽

Epidemiologic Studies ◽

K Nearest Neighbor ◽

Timeline Followback

Abstract Background — Missing data are a source of bias in epidemiologic studies. This is problematic in alcohol research where data missingness is linked to drinking behavior. Methods — The Safe Passage study was a prospective investigation of prenatal drinking and fetal/infant outcomes (n=11,083). Daily alcohol consumption for last reported drinking day and 30 days prior was recorded using Timeline Followback method. Of 3.2 million person-days, data were missing for 0.36 million. We imputed missing data using a machine learning algorithm; “K Nearest Neighbor” (K-NN). K-NN imputes missing values for a participant using data of participants closest to it. Imputed values were weighted for the distances from nearest neighbors and matched for day of week. Validation was done on randomly deleted data for 5-15 consecutive days. Results — Data from 5 nearest neighbors and segments of 55 days provided imputed values with least imputation error. After deleting data segments from with no missing days first trimester, there was no difference between actual and predicted values for 64% of deleted segments. For 31% of the segments, imputed data were within +/-1 drink/day of the actual. Conclusions — K-NN can be used to impute missing data in longitudinal studies of alcohol use during pregnancy with high accuracy.

Download Full-text

Mangrove Forest Classification in Drone Images Using HSV Color Moment and Haralick Features Extraction with K-Nearest Neighbor

Signal and Image Processing Letters ◽

10.31763/simple.v1i3.6 ◽

2019 ◽

Vol 1 (3) ◽

pp. 1-12

Author(s):

Agus Wahyu Widodo ◽

Deo Hernando ◽

Wayan Firdaus Mahmudy

Keyword(s):

Mangrove Forest ◽

Nearest Neighbor ◽

Classification Method ◽

Mangrove Forests ◽

K Nearest Neighbor ◽

Distance Method ◽

K Value ◽

Nearest Neighbor Classification ◽

Haralick Features ◽

Neighbor Classification

Due to the problems with uncontrolled changes in mangrove forests, a forest function management and supervision is required. The form of mangrove forest management carried out in this study is to measure the area of mangrove forests by observing the forests using drones or crewless aircraft. Drones are used to take photos because they can capture vast mangrove forests with high resolution. The drone was flown over above the mangrove forest and took several photos. The method used in this study is extracting color features using mean values, standard deviations, and skewness in the HSV color space and texture feature extraction with Haralick features. The classification method used is the k-nearest neighbor method. This study conducted three tests, namely testing the accuracy of the system, testing the distance method used in the k-nearest neighbor classification method, and testing the k value. Based on the results of the three tests above, three conclusions obtained. The first conclusion is that the classification system produces an accuracy of 84%. The second conclusion is that the distance method used in the k-nearest neighbor classification method influences the accuracy of the system. The distance method that produces the highest accuracy is the Euclidean distance method with an accuracy of 84%. The third conclusion is that the k value used in the k-nearest neighbor classification method influences the accuracy of the system. The k-value that produces the highest accuracy is k = 3, with an accuracy of 84%.

Download Full-text

Design and Analysis System of KNN and ID3 Algorithm for Music Classification based on Mood Feature Extraction

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v7i1.pp486-495 ◽

2017 ◽

Vol 7 (1) ◽

pp. 486

Author(s):

Made Sudarma ◽

I Gede Harsemadi

Keyword(s):

Feature Extraction ◽

Processing Time ◽

Nearest Neighbor ◽

Extraction Process ◽

Performance Comparison ◽

K Nearest Neighbor ◽

K Value ◽

Id3 Algorithm ◽

Analysis System ◽

Music Information

Each of music which has been created, has its own mood which is emitted, therefore, there has been many researches in Music Information Retrieval (MIR) field that has been done for recognition of mood to music. This research produced software to classify music to the mood by using K-Nearest Neighbor and ID3 algorithm. In this research accuracy performance comparison and measurement of average classification time is carried out which is obtained based on the value produced from music feature extraction process. For music feature extraction process it uses 9 types of spectral analysis, consists of 400 practicing data and 400 testing data. The system produced outcome as classification label of mood type those are contentment, exuberance, depression and anxious. Classification by using algorithm of KNN is good enough that is 86.55% at k value = 3 and average processing time is 0.01021. Whereas by using ID3 it results accuracy of 59.33% and average of processing time is 0.05091 second.

Download Full-text