Clustering Techniques and Their Applications: A Review

2020 ◽  
Vol 1 (4) ◽  
pp. 1-6
Author(s):  
Arjun Dutta

This paper deals with concise study on clustering: existing methods and developments made at various times. Clustering is defined as an unsupervised learning where the targets are sorted out on the foundation of some similarity inherent among them. In the recent times, we dispense with large masses of data including images, video, social text, DNA, gene information, etc. Data clustering analysis has come out as an efficient technique to accurately achieve the task of categorizing information into sensible groups. Clustering has a deep association with researches in several scientific fields. k-means algorithm was suggested in 1957. K-mean is the most popular partitional clustering method till date. In many commercial and non-commercial fields, clustering techniques are used. The applications of clustering in some areas like image segmentation, object and role recognition and data mining are highlighted. In this paper, we have presented a brief description of the surviving types of clustering approaches followed by a survey of the areas.

2021 ◽  
Vol 1 (1) ◽  
pp. 27-32
Author(s):  
Bambang Setio ◽  
Putri Prasetyaningrum

Yogyakarta merupakan salah satu kota di Indonesia yang memiliki daya tarik wisata dan merupakan kota tujuan wisata yang paling diminati oleh wisatawan, dilihat dari jumlah kunjungan wisatawan yang semakin naik dari tahun ke tahun. Selain sebagai kota wisata, Yogyakarta merupakan kota pelajar, kota budaya dan kota perjuangan. Karena Yogyakarta disebut sebagai kota wisata, banyak berbagai macam objek wisata yang ditawarkan oleh Kota Yogyakarta. Dalam hal ini, penerapan datamining mampu menjadi solusi dalam menganalisa data. Clustering termasuk ke dalam descriptive methods, dan juga termasuk unsupervised learning dimana tidak ada pendefinisian kelas objek sebelumnya. Sehingga clustering dapat digunakan untuk menentukan label kelas bagi data-data yang belum diketahui kelasnya. Metode K-Means termasuk dalam partitioning clustering yang memisahkan data ke daerah bagian yang terpisah. Metode K-Means sangat terkenal karena kemudahan dan kemampuannya untuk mengelompokkan data besar dan outlier dengan sangat cepat. dari data yang diinputkandan telah di proses melalui metode algoritma K-Means bahwa telah melakukan iterasi sebanyak 5 kali dengan memilih cluster 1, cluster 2, cluster 3 secara acak (random) dengan cluster 1 memiliki 24 data dengan persentase sebesar (50%), cluster 2 memiliki 11 data dengan persentase sebesar (23%),  dan cluster 3 memiliki 13 data dengan persentase sebesar (27%).  


Author(s):  
Baoying Wang ◽  
Imad Rahal ◽  
Richard Leipold

Data clustering is a discovery process that partitions a data set into groups (clusters) such that data points within the same group have high similarity while being very dissimilar to points in other groups (Han & Kamber, 2001). The ultimate goal of data clustering is to discover natural groupings in a set of patterns, points, or objects without prior knowledge of any class labels. In fact, in the machine-learning literature, data clustering is typically regarded as a form of unsupervised learning as opposed to supervised learning. In unsupervised learning or clustering, there is no training function as in supervised learning. There are many applications for data clustering including, but not limited to, pattern recognition, data analysis, data compression, image processing, understanding genomic data, and market-basket research.


Author(s):  
R. Buli Babu ◽  
G. Snehal ◽  
Aditya Satya Kiran

Data mining can be used to detect model crime problems. This paper is about the importance of datamining about its techniques and how we can easily solve the crime. Crime data will be stored in criminal’s database.To analyze the data easily we have data mining technique that is clustering. Clustering is a method to group identicalcharacteristics in which the similarity is maximized or minimized. In clustering techniques also we have different typeof algorithm, but in this paper we are using the k-means algorithm and expectation-maximization algorithm. We areusing these techniques because these two techniques come under the partition algorithm. Partition algorithm is oneof the best methods to solve crimes and to find the similar data and group it. K-means algorithm is used to partitionthe grouped object based on their means. Expectation-maximization algorithm is the extension of k-means algorithmhere we partition the data based on their parameters.


2015 ◽  
Vol 2 (1) ◽  
pp. 23-38 ◽  
Author(s):  
Aparna K. ◽  
Mydhili K. Nair

Data clustering has found significant applications in various domains like bioinformatics, medical data, imaging, marketing study and crime analysis. There are several types of data clustering such as partitional, hierarchical, spectral, density-based, mixture-modeling to name a few. Among these, partitional clustering is well suited for most of the applications due to the less computational requirement. An analysis of various literatures available on partitional clustering will not only provide good knowledge, but will also lead to find the recent problems in partitional clustering domain. Accordingly, it is planned to do a comprehensive study with the literature of partitional data clustering techniques. In this paper, thirty three research articles have been taken for survey from the standard publishers from 2005 to 2013 under two different aspects namely the technical aspect and the application aspect. The technical aspect is further classified based on partitional clustering, constraint-based partitional clustering and evolutionary programming-based clustering techniques. Furthermore, an analysis is carried out, to find out the importance of the different approaches that can be adopted, so that any new development in partitional data clustering can be made easier to be carried out by researchers.


2020 ◽  
Vol 4 (3) ◽  
pp. 744
Author(s):  
Murdiaty Murdiaty ◽  
Angela Angela ◽  
Chatrine Sylvia

Indonesia has fertile soil, natural resources and abundant marine resources. However, Indonesia is also not immune to the risk of natural disasters which are a series of events that disturb and threaten life safety and cause material and non-material losses. Indonesia's strategic geological location causes Indonesia to be frequently hit by earthquakes, volcanic eruptions and other natural disasters. From the data collected, natural disasters that occurred in Indonesia consisted of several categories, namely earthquakes, volcanic eruptions, floods, landslides, tornados, and tsunamis. Many natural disasters in Indonesia have caused casualties, both fatalities and injuries, destroying the surrounding area and destroying infrastructure and causing property losses. The trend of increasing incidence of natural disasters needs to be further investigated to prevent the number of victims from increasing. This information can be obtained through a data mining approach given the large amount of data available. In relation to natural disaster data, clustering techniques in data mining are very useful for grouping natural disaster data based on the same characteristics so that the data can be adopted as a groundwork for predicting natural disaster events in the future. Thus, this research is supposed to group natural disaster data using clustering techniques using the k-means algorithm into several groups, in terms of natural disaster types, time of disaster, number of victims, and damage to various facilities as a result of natural disasters


2018 ◽  
Vol 7 (2.32) ◽  
pp. 111
Author(s):  
Y Vijay Bhaskhar Reddy PP COMP.SCI.0560 ◽  
Dr L.S.S Reddy ◽  
Dr S.S.N. Reddy

Data extraction, data processing, pattern mining and clustering are the important features in data mining. The extraction of data and formation of interesting patterns from huge datasets can be used in prediction and decision making for further analysis. This improves, the need for efficient and effective analysis methods to make use of this data. Clustering is one important technique in data mining. In clustering a set of items are divided into several clusters where inter-cluster similarity is minimized and intra-cluster similarity is maximized. Clustering techniques are easy to identify of class in large databases. However, the application to large databases rises the following requirements for clustering techniques: minimal requirements of domain knowledge to determine the input specifications, invention of clusters with absolute shape & certainty of large databases.. The existing clustering techniques offer no solution to the combination of requirements. The proposed clustering technique DBSCAN using KNN relying on a density-based notion of clusters which is accomplished to discover clusters of arbitrary shape.  


2021 ◽  
Vol 3 (1) ◽  
pp. 1-7
Author(s):  
Tuti Hartati ◽  
Odi Nurdiawan ◽  
Eko Wiyandi

The process of accepting new cadet candidates at the Maritime Academy of Marine Sanctuary every year, produces a lot of data in the form of profiles of prospective cadets. The activity caused a large accumulation of data, it became difficult to identify prospective cadets. This research discusses the application of data mining to generate profiles that have similar attributes. One of the data mining techniques used to identify a group of objects that have the same characteristics is Cluster Analysis. The data clustering method is divided into one or more clusters that have the same characteristics called K-means. The method that the author uses is knowledge discovery in databases (KDD) consisting of Data, Data Cleaning, Data transformation, Data mining, Pattern evolution, knowledge. Implementation of K-means Clustering process using Rapid Miner. Attributes used by NIT, Level, Name, Student Status, Type of Registration, Gender, Place of Birth, Date of Birth, Religion, School Origin, School Origin Department, Religion, GPA, Subdistrict, District/ City, Province. Returns the number of clusters 30 (k=30). From the research results based on davies bouldin test on K-means algorithm resulted in the closest value of 0 is k = 29 with Davies bouldin: 0.070, with the most cluster member distribution in cluster 16 containing cluster members 115 items.


Sign in / Sign up

Export Citation Format

Share Document