K-Means Algorithm: An Assessment

In data mining, clustering is a technique in which the set of objects are assigned to a group called clusters. Clustering is the most essential part of data mining. K-means clustering is the basic clustering technique and is most widely used algorithm. It is also known as nearest neighbor searching. It simply clusters the datasets into given number of clusters. Numerous efforts have been made to improve the performance of the K-means clustering algorithm. In this paper we have been briefed in the form of a review the work carried out by the different researchers using Kmeans clustering. We have discussed the limitations and applications of the K-means clustering algorithm as well. This paper presents a current review about the K means clustering algorithm.

Download Full-text

A dynamic K-means clustering for data mining

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i2.pp521-526 ◽

2019 ◽

Vol 13 (2) ◽

pp. 521

Author(s):

Md. Zakir Hossain ◽

Md.Nasim Akhtar ◽

R.B. Ahmad ◽

Mostafijur Rahman

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Large Data ◽

Threshold Value ◽

Specific Pattern ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Data Points

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>

Download Full-text

An effective and efficient hierarchical K-means clustering algorithm

International Journal of Distributed Sensor Networks ◽

10.1177/1550147717728627 ◽

2017 ◽

Vol 13 (8) ◽

pp. 155014771772862 ◽

Cited By ~ 8

Author(s):

Jianpeng Qi ◽

Yanwei Yu ◽

Lihong Wang ◽

Jinglei Liu ◽

Yingjie Wang

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Hierarchical Optimization ◽

Clustering Method ◽

Number Of Clusters ◽

Computation Cost ◽

Optimization Principle ◽

Pruning Strategy ◽

Efficiency And Effectiveness ◽

Synthetic Datasets

K-means plays an important role in different fields of data mining. However, k-means often becomes sensitive due to its random seeds selecting. Motivated by this, this article proposes an optimized k-means clustering method, named k*-means, along with three optimization principles. First, we propose a hierarchical optimization principle initialized by k* seeds ([Formula: see text]) to reduce the risk of random seeds selecting, and then use the proposed “top- n nearest clusters merging” to merge the nearest clusters in each round until the number of clusters reaches at [Formula: see text]. Second, we propose an “optimized update principle” that leverages moved points updating incrementally instead of recalculating mean and [Formula: see text] of cluster in k-means iteration to minimize computation cost. Third, we propose a strategy named “cluster pruning strategy” to improve efficiency of k-means. This strategy omits the farther clusters to shrink the adjustable space in each iteration. Experiments performed on real UCI and synthetic datasets verify the efficiency and effectiveness of our proposed algorithm.

Download Full-text

AK-means: an automatic clustering algorithm based on K-means

Journal of Advanced Computer Science & Technology ◽

10.14419/jacst.v4i2.4749 ◽

2015 ◽

Vol 4 (2) ◽

pp. 231 ◽

Cited By ~ 1

Author(s):

Omar Kettani ◽

Faical Ramdani ◽

Benaissa Tadili

Keyword(s):

Data Mining ◽

Fast Algorithm ◽

Clustering Algorithm ◽

Data Sets ◽

Number Of Clusters ◽

Correct Number ◽

Standard Data ◽

Exact Number ◽

Automatic Clustering ◽

Clustering Problems

<p>In data mining, K-means is a simple and fast algorithm for solving clustering problems, but it requires that the user provides in advance the exact number of clusters (k), which is often not obvious. Thus, this paper intends to overcome this problem by proposing a parameter-free algorithm for automatic clustering. It is based on successive adequate restarting of K-means algorithm. Experiments conducted on several standard data sets demonstrate that the proposed approach is effective and outperforms the related well known algorithm G-means, in terms of clustering accuracy and estimation of the correct number of clusters.</p>

Download Full-text

SELECTION OF THE NUMBER OF CLUSTERS IN K-MEAN ALGORITHM USING CLUSTER SOLUTION ENTROPY

Vestnik of Ryazan State Radio Engineering University ◽

10.21667/1995-4565-2021-77-81-92 ◽

2021 ◽

Vol 77 ◽

pp. 81-92

Author(s):

V. I. Oreshkov ◽

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Cluster Structure ◽

Cluster Solution ◽

Number Of Clusters ◽

Advantages And Disadvantages ◽

Analytical Review ◽

The Mean ◽

Cluster Solutions ◽

Selection Of

The article discusses the problem of choosing the number of clusters in popular k-means clustering algorithm. It is noted that an unsuccessful choice of this hyper parameter can lead to the creation of a cluster structure the meaningful interpretation of which in the process of data mining leads to false conclusions and making incorrect management decisions based on them. The aim of the work is to develop a method for automatic selection of the number of clusters for k-means algorithm. The article provides an analytical review of the known methods for determining the number of clusters, their advantages and disadvantages being noted. The proposed approach is based on the elbow method, which uses the entropy of cluster solutions instead of the mean squares of clustering error. A practical example shows that the use of cluster solution entropy makes it possible to choose the number of clusters even in the case when the approach based on clustering error turns out to be untenable.

Download Full-text

The clinical syndrome of malaria in the United States. A current review of diagnosis and treatment for American physicians

Archives of Internal Medicine ◽

10.1001/archinte.129.4.607 ◽

1972 ◽

Vol 129 (4) ◽

pp. 607-616 ◽

Cited By ~ 1

Author(s):

H. S. Heineman

Keyword(s):

United States ◽

The United States ◽

Clinical Syndrome ◽

Diagnosis And Treatment ◽

Current Review ◽

A Current

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text

Teknik Data Mining Dalam Clustering Produksi Susu Segar Di Indonesia Dengan Algoritma K-Means

BRAHMANA: Jurnal Penerapan Kecerdasan Buatan ◽

10.30645/brahmana.v1i1.5 ◽

2019 ◽

Vol 1 (1) ◽

pp. 31-39

Author(s):

Ilham Safitra Damanik ◽

Sundari Retno Andani ◽

Dedi Sehendro

Keyword(s):

Data Mining ◽

Milk Production ◽

Clustering Algorithm ◽

Clustering Method ◽

Data Mining Techniques ◽

Low Level ◽

Fresh Milk ◽

Nutritional Needs ◽

High Level ◽

Level Cluster

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text

K-MEANS CLUSTERING ALGORITHM FOR SERVICE DATA ANALYSIS BASED ON CUSTOMERS COMBINATION

Unes journal of Information System ◽

10.31933/ujis.3.1.001-007.2018 ◽

2018 ◽

Vol 3 (1) ◽

pp. 001

Author(s):

Zulhendra Zulhendra ◽

Gunadi Widi Nurcahyo ◽

Julius Santony

Keyword(s):

Data Mining ◽

Data Analysis ◽

Clustering Algorithm ◽

Customer Complaints ◽

Using Data ◽

Clustering Data ◽

Service Data ◽

Selection Of

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01995 ◽

2010 ◽

Vol 30 (8) ◽

pp. 1995-1998 ◽

Cited By ~ 18

Author(s):

Shi-bing ZHOU ◽

Zhen-yuan XU ◽

Xu-qing TANG

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text