Comparison of running time between C4.5 and k-nearest neighbor (k-NN) algorithm on deciding mainstay area clustering

Developing a sustainable activity needs a good plan, so the programs can be effective and have a clear objective. Therefore, a model to help the analysis is significantly needed in determining the priority area to conduct better development in the future. This research applies the concept of Klassen Typology to analyze PDRB data in Papua Province. Based on the result of using Klassen typology analysis method, there are 4 (four) quadrants of area classification in Papua Province. Twenty nine regencies were analyzed based on PDRB data to investigate which area can be used as the development of priority area in the future. The method used in this study is C4.5 and K-Nearest Neighbor. Time complexity becomes test standard of a particular algorithm to get efficient execution time when it is implemented into programming language. The approach of asymptotic analysis using the concept of Big-O is one of the techniques that is usually used to test time complexity of an algorithm. Based on the test result of both methods, it shows that the result of running time of KNN is more stable than of C4.5 although the analysis of Big-O gives the same complexity.

Download Full-text

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

Intelligent Data Analysis ◽

10.3233/ida-205497 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1453-1471

Author(s):

Chunhua Tang ◽

Han Wang ◽

Zhiwen Wang ◽

Xiangkun Zeng ◽

Huaran Yan ◽

...

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Substantial Improvement ◽

Experimental Results ◽

High Time ◽

Parameter Setting ◽

K Nearest Neighbor ◽

Density Based Clustering

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

Download Full-text

PERBANDINGAN METODE HOT-DECK IMPUTATION DAN METODE KNNI DALAM MENGATASI MISSING VALUES

Seminar Nasional Official Statistics ◽

10.34123/semnasoffstat.v2019i1.101 ◽

2020 ◽

Vol 2019 (1) ◽

pp. 275-285

Author(s):

Iman Jihad Fadillah ◽

Siti Muchlisoh

Keyword(s):

Missing Values ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Running Time ◽

Hot Deck Imputation ◽

Nearest Neighbor Imputation

Salah satu ciri data statistik yang berkualitas adalah completeness. Namun, pada penyelenggaraan sensus atau survei, sering kali ditemukan masalah data hilang atau tidak lengkap (missing values), tidak terkecuali pada data Survei Sosial Ekonomi Indonesia (Susenas). Berbagai masalah dapat ditimbulkan oleh missing values. Oleh karena itu, masalah missing values harus ditangani. Imputasi adalah cara yang sering digunakan untuk menangani masalah ini. Terdapat beberapa metode imputasi yang telah dikembangkan untuk menangani missing values. Hot-deck Imputation dan K-Nearest Neighbor Imputation (KNNI) merupakan metode yang dapat digunakan untuk menangani masalah missing values. Metode Hot-deck Imputation dan KNNI memanfaatkan variabel prediktor untuk melakukan proses imputasi dan tidak memerlukan asumsi yang rumit dalam penggunaannya. Algoritma dan cara penanganan missing values yang berbeda pada kedua metode tentunya dapat menghasilkan hasil estimasi yang berbeda pula. Penelitian ini membandingkan metode Hot-deck Imputation dan KNNI dalam mengatasi missing values. Analisis perbandingan dilakukan dengan melihat ketepatan estimator melalui nilai RMSE dan MAPE. Selain itu, diukur juga performa komputasi melalui penghitungan running time pada proses imputasi. Implementasi kedua metode pada data Susenas Maret Tahun 2017 menunjukkan bahwa, metode KNNI menghasilkan ketepatan estimator yang lebih baik dibandingkan Hot-deck Imputation. Namun, performa komputasi yang dihasilkan pada Hot-deck Imputation lebih baik dibandingkan KNNI.

Download Full-text

A GRAPH-BASED APPROACH TO DETECT ABNORMAL SPATIAL POINTS AND REGIONS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213011000309 ◽

2011 ◽

Vol 20 (04) ◽

pp. 721-751 ◽

Cited By ~ 6

Author(s):

CHANG-TIEN LU ◽

RAIMUNDO F. DOS SANTOS ◽

XUTONG LIU ◽

YUFENG KOU

Keyword(s):

Time Complexity ◽

Nearest Neighbor ◽

Census Data ◽

High Weight ◽

Transportation Systems ◽

Detection Methods ◽

K Nearest Neighbor ◽

Isolated Points ◽

Neighbor Relationship ◽

Spatial Outliers

Spatial outliers are the spatial objects whose nonspatial attribute values are quite different from those of their spatial neighbors. Identification of spatial outliers is an important task for data mining researchers and geographers. A number of algorithms have been developed to detect spatial anomalies in meteorological images, transportation systems, and contagious disease data. In this paper, we propose a set of graph-based algorithms to identify spatial outliers. Our method first constructs a graph based on k-nearest neighbor relationship in spatial domain, assigns the differences of nonspatial attribute as edge weights, and continuously cuts high-weight edges to identify isolated points or regions that are much dissimilar to their neighboring objects. The proposed algorithms have three major advantages compared with other existing spatial outlier detection methods: accurate in detecting both point and region outliers, capable of avoiding false outliers, and capable of computing the local outlierness of an object within subgraphs. We present time complexity of the algorithms, and show experiments conducted on US housing and Census data to demonstrate the effectiveness of the proposed approaches.

Download Full-text

Modified Support Vector Machine Algorithm to Reduce Misclassification and Optimizing Time Complexity

Big Data Analytics for Satellite Image Processing and Remote Sensing - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-3643-7.ch003 ◽

2018 ◽

pp. 34-56

Author(s):

Aditya Ashvin Doshi ◽

Prabu Sevugan ◽

P. Swarnalatha

Keyword(s):

Support Vector Machine ◽

Time Complexity ◽

Nearest Neighbor ◽

Mining Machine ◽

Support Vector ◽

Classification Problems ◽

K Nearest Neighbor ◽

Time Consumption ◽

Data Points ◽

Existing Data

A number of methodologies are available in the field of data mining, machine learning, and pattern recognition for solving classification problems. In past few years, retrieval and extraction of information from a large amount of data is growing rapidly. Classification is nothing but a stepwise process of prediction of responses using some existing data. Some of the existing prediction algorithms are support vector machine and k-nearest neighbor. But there is always some drawback of each algorithm depending upon the type of data. To reduce misclassification, a new methodology of support vector machine is introduced. Instead of having the hyperplane exactly in middle, the position of hyperplane is to be change per number of data points of class available near the hyperplane. To optimize the time consumption for computation of classification algorithm, some multi-core architecture is used to compute more than one independent module simultaneously. All this results in reduction in misclassification and faster computation of class for data point.

Download Full-text

A NEW TRAFFIC SPEED FORECASTING METHOD BASED ON BI-PATTERN RECOGNITION

Fluctuation and Noise Letters ◽

10.1142/s0219477511000405 ◽

2011 ◽

Vol 10 (01) ◽

pp. 59-75 ◽

Cited By ~ 14

Author(s):

JING WANG ◽

PENGJIAN SHANG ◽

XIAOJUN ZHAO

Keyword(s):

Pattern Recognition ◽

Traffic Control ◽

Nearest Neighbor ◽

Traffic Prediction ◽

K Nearest Neighbor ◽

Short Term ◽

Current State ◽

Pattern Size ◽

The Future ◽

Traffic Speed

Short-term traffic forecasting has played a key role in supporting the need of proactive and dynamic traffic control system. K-nearest neighbor (KNN) nonparametric regression models have been widely used in traffic prediction. KNN models give predictions based on the future state of traffic speed that is completely determined by the current state, but with no dependence on the past sequences of traffic speed that produced the current state. In fact, traffic speed is not completely random in nature, and some patterns repeat in the traffic stream. In this paper, we proposed a methodology called bi-pattern recognition KNN model (BKNN) which uses pattern recognition technique twice in the searching process to predict the future traffic state. Then the proposed BKNN model is applied to predict one day real traffic speed series of two sites, which are located near the North 2nd and 3rd Ring Road in Beijing, respectively. With the optimal neighbor and pattern size, the BKNN model provides good predictions. Moreover, in comparison with the KNN model, PKNN model (a modified model based on KNN), seasonal autoregressive integrated moving average (SARIMA) and the artificial neural networks (ANN), the BKNN model appears to be the most promising and robust of the five models to provide better short-term traffic prediction.

Download Full-text

Machine Learning Verdict of EEG Signals in Brain Computer Interface

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1838114 ◽

2018 ◽

pp. 429-441

Author(s):

M. Jeyanthi ◽

C. Velayutham

Keyword(s):

Nearest Neighbor ◽

Technology Development ◽

Vital Role ◽

Svm Classifier ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Data Set ◽

Eeg Data ◽

Irrelevant Attributes

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.

Download Full-text

PENENTUAN DAERAH PRIORITAS PELAYANAN AKTA KELAHIRAN DENGAN METODE K-NN DAN K-MEANS

Komputasi: Jurnal Ilmiah Ilmu Komputer dan Matematika ◽

10.33751/komputasi.v17i1.1735 ◽

2020 ◽

Vol 17 (1) ◽

pp. 319-328

Author(s):

Ade Muchlis Maulana Anwar ◽

Prihastuti Harsani ◽

Aries Maesya

Keyword(s):

Nearest Neighbor ◽

Information Gain ◽

Birth Certificate ◽

Population Data ◽

Community Services ◽

Birth Certificates ◽

Similar Data ◽

K Nearest Neighbor ◽

Civil Registration ◽

The Family

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.

Download Full-text

A Scalable K-Nearest Neighbor Algorithm for Recommendation System Problems

2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO) ◽

10.23919/mipro48935.2020.9245195 ◽

2020 ◽

Author(s):

A. Sagdic ◽

C. Tekinbas ◽

E. Arslan ◽

T. Kucukyilmaz

Keyword(s):

Recommendation System ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Download Full-text

Optimizing Error Rate in Intrusion Detection System Using Artificial Neural Network Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i9.102 ◽

2018 ◽

Vol 6 (9) ◽

pp. 152

Author(s):

S. Vijaya Rani ◽

G. N. K. Suresh Babu

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Intrusion Detection ◽

Error Rate ◽

Learning Process ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Artificial Neural

The illegal hackers penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text