Clustering Techniques and Their Applications: A Review

This paper deals with concise study on clustering: existing methods and developments made at various times. Clustering is defined as an unsupervised learning where the targets are sorted out on the foundation of some similarity inherent among them. In the recent times, we dispense with large masses of data including images, video, social text, DNA, gene information, etc. Data clustering analysis has come out as an efficient technique to accurately achieve the task of categorizing information into sensible groups. Clustering has a deep association with researches in several scientific fields. k-means algorithm was suggested in 1957. K-mean is the most popular partitional clustering method till date. In many commercial and non-commercial fields, clustering techniques are used. The applications of clustering in some areas like image segmentation, object and role recognition and data mining are highlighted. In this paper, we have presented a brief description of the surviving types of clustering approaches followed by a survey of the areas.

Download Full-text

PENERAPAN DATA MINING DALAM MENGELOMPOKKAN KUNJUNGAN WISATAWAN DI KOTA YOGYAKARTA MENGGUNAKAN METODE K-MEANS

Journal of Computer Science and Technology (JCS-TECH) ◽

10.54840/jcstech.v1i1.9 ◽

2021 ◽

Vol 1 (1) ◽

pp. 27-32

Author(s):

Bambang Setio ◽

Putri Prasetyaningrum

Keyword(s):

Data Mining ◽

Unsupervised Learning ◽

Data Clustering ◽

Cluster 2

Yogyakarta merupakan salah satu kota di Indonesia yang memiliki daya tarik wisata dan merupakan kota tujuan wisata yang paling diminati oleh wisatawan, dilihat dari jumlah kunjungan wisatawan yang semakin naik dari tahun ke tahun. Selain sebagai kota wisata, Yogyakarta merupakan kota pelajar, kota budaya dan kota perjuangan. Karena Yogyakarta disebut sebagai kota wisata, banyak berbagai macam objek wisata yang ditawarkan oleh Kota Yogyakarta. Dalam hal ini, penerapan datamining mampu menjadi solusi dalam menganalisa data. Clustering termasuk ke dalam descriptive methods, dan juga termasuk unsupervised learning dimana tidak ada pendefinisian kelas objek sebelumnya. Sehingga clustering dapat digunakan untuk menentukan label kelas bagi data-data yang belum diketahui kelasnya. Metode K-Means termasuk dalam partitioning clustering yang memisahkan data ke daerah bagian yang terpisah. Metode K-Means sangat terkenal karena kemudahan dan kemampuannya untuk mengelompokkan data besar dan outlier dengan sangat cepat. dari data yang diinputkandan telah di proses melalui metode algoritma K-Means bahwa telah melakukan iterasi sebanyak 5 kali dengan memilih cluster 1, cluster 2, cluster 3 secara acak (random) dengan cluster 1 memiliki 24 data dengan persentase sebesar (50%), cluster 2 memiliki 11 data dengan persentase sebesar (23%), dan cluster 3 memiliki 13 data dengan persentase sebesar (27%).

Download Full-text

Categorization of Data Clustering Techniques

Handbook of Research on Public Information Technology ◽

10.4018/978-1-59904-857-4.ch052 ◽

2008 ◽

pp. 568-577

Author(s):

Baoying Wang ◽

Imad Rahal ◽

Richard Leipold

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

Data Clustering ◽

Analysis Data ◽

Discovery Process ◽

Data Set ◽

Market Basket ◽

Clustering Techniques ◽

Data Points ◽

Class Labels

Data clustering is a discovery process that partitions a data set into groups (clusters) such that data points within the same group have high similarity while being very dissimilar to points in other groups (Han & Kamber, 2001). The ultimate goal of data clustering is to discover natural groupings in a set of patterns, points, or objects without prior knowledge of any class labels. In fact, in the machine-learning literature, data clustering is typically regarded as a form of unsupervised learning as opposed to supervised learning. In unsupervised learning or clustering, there is no training function as in supervised learning. There are many applications for data clustering including, but not limited to, pattern recognition, data analysis, data compression, image processing, understanding genomic data, and market-basket research.

Download Full-text

A data mining strategy for inductive data clustering: a synergy between self-organising neural networks and K-means clustering techniques

2000 TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium (Cat. No.00CH37119) ◽

10.1109/tencon.2000.888802 ◽

2002 ◽

Cited By ~ 3

Author(s):

S.S.R. Abidi ◽

J. Ong

Keyword(s):

Data Mining ◽

Neural Networks ◽

Data Clustering ◽

Clustering Techniques ◽

Data Mining Strategy

Download Full-text

Detection of Crimes Using Unsupervised Learning Techniques

APTIKOM Journal on Computer Science and Information Technologies ◽

10.34306/csit.v2i1.62 ◽

2020 ◽

Vol 2 (1) ◽

pp. 8-11

Author(s):

R. Buli Babu ◽

G. Snehal ◽

Aditya Satya Kiran

Keyword(s):

Data Mining ◽

Unsupervised Learning ◽

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Similar Data ◽

Data Mining Technique ◽

Clustering Techniques ◽

Mining Technique ◽

Learning Techniques ◽

Object Based

Data mining can be used to detect model crime problems. This paper is about the importance of datamining about its techniques and how we can easily solve the crime. Crime data will be stored in criminal’s database.To analyze the data easily we have data mining technique that is clustering. Clustering is a method to group identicalcharacteristics in which the similarity is maximized or minimized. In clustering techniques also we have different typeof algorithm, but in this paper we are using the k-means algorithm and expectation-maximization algorithm. We areusing these techniques because these two techniques come under the partition algorithm. Partition algorithm is oneof the best methods to solve crimes and to find the similar data and group it. K-means algorithm is used to partitionthe grouped object based on their means. Expectation-maximization algorithm is the extension of k-means algorithmhere we partition the data based on their parameters.

Download Full-text

Comprehensive Study and Analysis of Partitional Data Clustering Techniques

International Journal of Business Analytics ◽

10.4018/ijban.2015010102 ◽

2015 ◽

Vol 2 (1) ◽

pp. 23-38 ◽

Cited By ~ 3

Author(s):

Aparna K. ◽

Mydhili K. Nair

Keyword(s):

Data Clustering ◽

Evolutionary Programming ◽

Technical Aspect ◽

Mixture Modeling ◽

Crime Analysis ◽

Clustering Techniques ◽

Partitional Clustering ◽

New Development ◽

Computational Requirement ◽

Comprehensive Study

Data clustering has found significant applications in various domains like bioinformatics, medical data, imaging, marketing study and crime analysis. There are several types of data clustering such as partitional, hierarchical, spectral, density-based, mixture-modeling to name a few. Among these, partitional clustering is well suited for most of the applications due to the less computational requirement. An analysis of various literatures available on partitional clustering will not only provide good knowledge, but will also lead to find the recent problems in partitional clustering domain. Accordingly, it is planned to do a comprehensive study with the literature of partitional data clustering techniques. In this paper, thirty three research articles have been taken for survey from the standard publishers from 2005 to 2013 under two different aspects namely the technical aspect and the application aspect. The technical aspect is further classified based on partitional clustering, constraint-based partitional clustering and evolutionary programming-based clustering techniques. Furthermore, an analysis is carried out, to find out the importance of the different approaches that can be adopted, so that any new development in partitional data clustering can be made easier to be carried out by researchers.

Download Full-text

Pengelompokkan Data Bencana Alam Berdasarkan Wilayah, Waktu, Jumlah Korban dan Kerusakan Fasilitas Dengan Algoritma K-Means

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v4i3.2213 ◽

2020 ◽

Vol 4 (3) ◽

pp. 744

Author(s):

Murdiaty Murdiaty ◽

Angela Angela ◽

Chatrine Sylvia

Keyword(s):

Data Mining ◽

Natural Disasters ◽

Natural Disaster ◽

Data Clustering ◽

Volcanic Eruptions ◽

Marine Resources ◽

Clustering Techniques ◽

Data Mining Approach ◽

Disaster Data ◽

Property Losses

Indonesia has fertile soil, natural resources and abundant marine resources. However, Indonesia is also not immune to the risk of natural disasters which are a series of events that disturb and threaten life safety and cause material and non-material losses. Indonesia's strategic geological location causes Indonesia to be frequently hit by earthquakes, volcanic eruptions and other natural disasters. From the data collected, natural disasters that occurred in Indonesia consisted of several categories, namely earthquakes, volcanic eruptions, floods, landslides, tornados, and tsunamis. Many natural disasters in Indonesia have caused casualties, both fatalities and injuries, destroying the surrounding area and destroying infrastructure and causing property losses. The trend of increasing incidence of natural disasters needs to be further investigated to prevent the number of victims from increasing. This information can be obtained through a data mining approach given the large amount of data available. In relation to natural disaster data, clustering techniques in data mining are very useful for grouping natural disaster data based on the same characteristics so that the data can be adopted as a groundwork for predicting natural disaster events in the future. Thus, this research is supposed to group natural disaster data using clustering techniques using the k-means algorithm into several groups, in terms of natural disaster types, time of disaster, number of victims, and damage to various facilities as a result of natural disasters

Download Full-text

Partitional Clustering Techniques for Multi-Spectral Image Segmentation

Journal of Computers ◽

10.4304/jcp.2.10.1-8 ◽

2007 ◽

Vol 2 (10) ◽

Cited By ~ 6

Author(s):

Danielle Nuzillard ◽

Cosmin Lazar

Keyword(s):

Image Segmentation ◽

Spectral Image ◽

Clustering Techniques ◽

Partitional Clustering

Download Full-text

An Efficient Density Based Clustering approach for High Dimensional Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.32.15381 ◽

2018 ◽

Vol 7 (2.32) ◽

pp. 111

Author(s):

Y Vijay Bhaskhar Reddy PP COMP.SCI.0560 ◽

Dr L.S.S Reddy ◽

Dr S.S.N. Reddy

Keyword(s):

Data Mining ◽

Data Clustering ◽

Domain Knowledge ◽

Pattern Mining ◽

Data Extraction ◽

Clustering Techniques ◽

Density Based Clustering ◽

Large Databases ◽

Clustering Approach ◽

Effective Analysis

Data extraction, data processing, pattern mining and clustering are the important features in data mining. The extraction of data and formation of interesting patterns from huge datasets can be used in prediction and decision making for further analysis. This improves, the need for efficient and effective analysis methods to make use of this data. Clustering is one important technique in data mining. In clustering a set of items are divided into several clusters where inter-cluster similarity is minimized and intra-cluster similarity is maximized. Clustering techniques are easy to identify of class in large databases. However, the application to large databases rises the following requirements for clustering techniques: minimal requirements of domain knowledge to determine the input specifications, invention of clusters with absolute shape & certainty of large databases.. The existing clustering techniques offer no solution to the combination of requirements. The proposed clustering technique DBSCAN using KNN relying on a density-based notion of clusters which is accomplished to discover clusters of arbitrary shape.

Download Full-text

Unsupervised Learning for Data Clustering Based Image Segmentation

Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment ◽

10.1007/978-981-13-9217-7_4 ◽

2019 ◽

pp. 63-84

Author(s):

Xiaochun Wang ◽

Xiali Wang ◽

Don Mitchell Wilkes

Keyword(s):

Image Segmentation ◽

Unsupervised Learning ◽

Data Clustering

Download Full-text

Analisis Dan Penerapan Algoritma K-Means Dalam Strategi Promosi Kampus Akademi Maritim Suaka Bahari

Jurnal Sains Teknologi Transportasi Maritim ◽

10.51578/j.sitektransmar.v3i1.30 ◽

2021 ◽

Vol 3 (1) ◽

pp. 1-7

Author(s):

Tuti Hartati ◽

Odi Nurdiawan ◽

Eko Wiyandi

Keyword(s):

Data Mining ◽

Data Clustering ◽

Data Cleaning ◽

Knowledge Discovery In Databases ◽

Birth Date ◽

Cluster Member ◽

Place Of Birth ◽

Clustering Method ◽

Number Of Clusters ◽

Knowledge Implementation

The process of accepting new cadet candidates at the Maritime Academy of Marine Sanctuary every year, produces a lot of data in the form of profiles of prospective cadets. The activity caused a large accumulation of data, it became difficult to identify prospective cadets. This research discusses the application of data mining to generate profiles that have similar attributes. One of the data mining techniques used to identify a group of objects that have the same characteristics is Cluster Analysis. The data clustering method is divided into one or more clusters that have the same characteristics called K-means. The method that the author uses is knowledge discovery in databases (KDD) consisting of Data, Data Cleaning, Data transformation, Data mining, Pattern evolution, knowledge. Implementation of K-means Clustering process using Rapid Miner. Attributes used by NIT, Level, Name, Student Status, Type of Registration, Gender, Place of Birth, Date of Birth, Religion, School Origin, School Origin Department, Religion, GPA, Subdistrict, District/ City, Province. Returns the number of clusters 30 (k=30). From the research results based on davies bouldin test on K-means algorithm resulted in the closest value of 0 is k = 29 with Davies bouldin: 0.070, with the most cluster member distribution in cluster 16 containing cluster members 115 items.

Download Full-text