KOMPARASI METODE CLUSTERING K-MEANS DAN K-MEDOIDS DENGAN  MODEL FUZZY RFM UNTUK PENGELOMPOKAN PELANGGAN

Elly Muningsih - AMIK BSI Yogyakarta

doi:10.31294/evolusi.v6i2.4600

KOMPARASI METODE CLUSTERING K-MEANS DAN K-MEDOIDS DENGAN MODEL FUZZY RFM UNTUK PENGELOMPOKAN PELANGGAN

Evolusi : Jurnal Sains dan Manajemen ◽

10.31294/evolusi.v6i2.4600 ◽

2018 ◽

Vol 6 (2) ◽

Author(s):

Elly Muningsih - AMIK BSI Yogyakarta

Keyword(s):

Data Clustering ◽

Small Data ◽

Clustering Methods ◽

Monetary Model ◽

Clustering Method ◽

Online Sales ◽

Rfm Model ◽

Potential Customers ◽

Cluster 2 ◽

Better Than

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.

Download Full-text

On Fuzzy Non-Metric Model for Data with Tolerance and its Application to Incomplete Data Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0571 ◽

2016 ◽

Vol 20 (4) ◽

pp. 571-579 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Tomoyuki Suzuki ◽

Naohiko Kinoshita ◽

Yukihiro Hamasuna ◽

...

Keyword(s):

Data Clustering ◽

Incomplete Data ◽

Clustering Algorithm ◽

Uncertain Data ◽

Data Sets ◽

Membership Degree ◽

Clustering Methods ◽

Clustering Method ◽

Numerical Examples ◽

Metric Model

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.

Download Full-text

A review on data clustering using spiking neural network (SNN) models

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i3.pp1392-1400 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1392

Author(s):

Siti Aisyah Mohamed ◽

Muhaini Othman ◽

Mohd Hafizul Afifi

Keyword(s):

Neural Network ◽

Data Clustering ◽

Network Clustering ◽

Complex Data ◽

Spiking Neural Network ◽

Clustering Methods ◽

Clustering Method ◽

Static Data ◽

Clustering Approach ◽

Clustering Problems

The evolution of Artificial Neural Network recently gives researchers an interest to explore deep learning evolved by Spiking Neural Network clustering methods. Spiking Neural Network (SNN) models captured neuronal behaviour more precisely than a traditional neural network as it contains the theory of time into their functioning model [1]. The aim of this paper is to reviewed studies that are related to clustering problems employing Spiking Neural Networks models. Even though there are many algorithms used to solve clustering problems, most of the methods are only suitable for static data and fixed windows of time series. Hence, there is a need to analyse complex data type, the potential for improvement is encouraged. Therefore, this paper summarized the significant result obtains by implying SNN models in different clustering approach. Thus, the findings of this paper could demonstrate the purpose of clustering method using SNN for the fellow researchers from various disciplines to discover and understand complex data.

Download Full-text

An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations

Machine Learning ◽

10.1007/s10994-021-06021-7 ◽

2021 ◽

Author(s):

Avgoustinos Vouros ◽

Stephen Langdell ◽

Mike Croucher ◽

Eleni Vasilaki

Keyword(s):

Data Clustering ◽

Stochastic Methods ◽

Data Sets ◽

Local Minima ◽

Clustering Methods ◽

Empirical Comparison ◽

Real World Data ◽

Trade Off ◽

Deterministic Methods ◽

Better Than

AbstractK-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages; it is only able to find local minima and the positions of the initial clustering centres (centroids) can greatly affect the clustering solution. Over the years many K-Means variations and initialisation techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations along with a range of deterministic and stochastic initialisation techniques. We show that, on average, more sophisticated initialisation techniques alleviate the need for complex clustering methods. Furthermore, deterministic methods perform better than stochastic methods. However, there is a trade-off: less sophisticated stochastic methods, executed multiple times, can result in better clustering. Factoring in execution time, deterministic methods can be competitive and result in a good clustering solution. These conclusions are obtained through extensive benchmarking using a range of synthetic model generators and real-world data sets.

Download Full-text

Clustering Algorithm For Determining Marketing Targets Based Customer Purchase Patterns And Behaviors

SinkrOn ◽

10.33395/sinkron.v6i1.11191 ◽

2021 ◽

Vol 6 (1) ◽

pp. 137-143

Author(s):

Amir Mahmud Husein ◽

Februari Kurnia Waruwu ◽

Yacobus M.T. Batu Bara ◽

Meleyaki Donpril ◽

Mawaddah Harahap

Keyword(s):

Clustering Algorithm ◽

Customer Segmentation ◽

Monetary Model ◽

Business World ◽

Homogeneous Groups ◽

Rfm Model ◽

Proposed Model ◽

Marketing Analysis ◽

Purchase Patterns ◽

Potential Customers

Customer segmentation is one of the most important applications in the business world, specifically for marketing analysis, but since the Corona Virus (Covid-19) spread in Indonesia it has had a significant impact on the level of digital shopping activities because people prefer to buy their needs online, so It is very important to predict customer behavior in marketing strategy. In this study, the K-Means Clustering technique is proposed on the RFM (Recency, Frequency, Monetary) model for segmenting potential customers. The proposed model starts from the data cleaning stage, exploratory analysis to understand the data and finally applies K-Means Clustering to the RFM Model which produces three clusters based on the Elbow model. In cluster 0 there are 2,436 customers, in cluster1 1,880 and finally in cluster2 there are 18 customers. RFM analysis can segment customers into homogeneous groups quickly with a minimum set of variables. Good analysis can increase the effectiveness and efficiency of marketing plans, thereby increasing profitability with minimum costs.

Download Full-text

A Density-Peak-Based Clustering Method for Multiple Densities Dataset

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10090589 ◽

2021 ◽

Vol 10 (9) ◽

pp. 589

Author(s):

Zhicheng Shi ◽

Ding Ma ◽

Xue Yan ◽

Wei Zhu ◽

Zhigang Zhao

Keyword(s):

Data Mining ◽

Big Data ◽

Parameter Selection ◽

Clustering Methods ◽

Clustering Method ◽

Density Peak ◽

Unique Shape ◽

Density Peak Clustering ◽

Selection Of ◽

Better Than

Clustering methods in data mining are widely used to detect hotspots in many domains. They play an increasingly important role in the era of big data. As an advanced algorithm, the density peak clustering (DPC) algorithm is able to deal with arbitrary datasets, although it does not perform well when the dataset includes multiple densities. The parameter selection of cut-off distance dc is normally determined by users’ experience and could affect clustering result. In this study, a density-peak-based clustering method is proposed to detect clusters from datasets with multiple densities and shapes. Two improvements are made regarding the limitations of existing clustering methods. First, DPC finds it difficult to detect clusters in a dataset with multiple densities. Each cluster has a unique shape and the interior includes different densities. This method adopts a step by step merging approach to solve the problem. Second, high densities of points can automatically be selected without manual participation, which is more efficient than the existing methods, which require user-specified parameters. According to experimental results, the clustering method can be applied to various datasets and performs better than traditional methods and DPC.

Download Full-text

PERSEBARAN LOKASI PRAKTEK BIDAN MELALUI PENERAPAN SISTEM INFORMASI GEOGRAFIS MENGGUNAKAN METODE CLUSTERING

JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika) ◽

10.29100/jipi.v2i1.59 ◽

2017 ◽

Vol 2 (1) ◽

Author(s):

Andi Setiawan ◽

Sri Nining ◽

Tri Ginanjar Laksana

Keyword(s):

Quality Of Service ◽

Pregnant Women ◽

Data Clustering ◽

Measurement Data ◽

Clustering Methods ◽

Clustering Method ◽

Service Clustering ◽

Segment Data ◽

Vast Area

Distribution of midwife practice pomegranate (quality of service) in Cirebon is difficult to know where the location of the practice because of the vast area of Cirebon. Then, the number of pregnant women who are less get help quickly (giving birth without medical assistance) because of ignorance location midwife practice pomegranate (quality of service) nearby. And the number of midwives pomegranate (quality of service) has not cooperated with the insurance BPJS to perform payment transactions. This study uses a clustering method, which can segment data clustering method, which is used to facilitate information retrieval midwife pomegranate (quality of service). Clustering methods have representation stage pattern, the selection traits or characteristics, pattern proximity, distance measurement, data obtained from IBI (Indonesian Midwives Association) and the tools used: phpMyAdmin, notepad ++, xampp, GoogleMapApi, Dreamwaver. This system can be expected to map the location of the practice of midwives pomegranate (quality of service) in the district of Cirebon, can find the nearest location midwife pomegranate (quality of service), can find pomegranate midwives who work with BPJS to perform payment transactions. Then, hopefully it can help people in handling pregnant women rapidly. And, is expected to reduce maternal and child mortality.

Download Full-text

CLG clustering for dropout prediction using log-data clustering method

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i3.pp764-770 ◽

2021 ◽

Vol 10 (3) ◽

pp. 764

Author(s):

Agung Triayudi ◽

Wahyu Oktri Widyarto ◽

Lia Kamelia ◽

Iksal Iksal ◽

Sumiati Sumiati

Keyword(s):

Data Mining ◽

Data Clustering ◽

Statistical Data ◽

Source Code ◽

Educational Data Mining ◽

Mining Machine ◽

Clustering Methods ◽

Clustering Method ◽

Log Data ◽

Cluster Data

<span lang="EN-US">Implementation of data mining, machine learning, and statistical data from educational department commonly known as educational data mining. Most of school systems require a teacher to teach a number of students at one time. Exam are regularly being use as a method to measure student’s achievement, which is difficult to understand because examination cannot be done easily. The other hand, programming classes makes source code editing and UNIX commands able to easily detect and store automatically as log-data. Hence, rather that estimating the performance of those student based on this log-data, this study being more focused on detecting them who experienced a difficulty or unable to take programming classes. We propose CLG clustering methods that can predict a risk of being dropped out from school using cluster data for outlier detection.</span>

Download Full-text

Network performance data clustering method based on semantic description and optimization

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.01522 ◽

2013 ◽

Vol 32 (6) ◽

pp. 1522-1525

Author(s):

Da-qing JIANG ◽

Yong ZHOU ◽

Shi-xiong XIA

Keyword(s):

Data Clustering ◽

Network Performance ◽

Performance Data ◽

Semantic Description ◽

Clustering Method

Download Full-text

Applicability Evaluation of Several Spatial Clustering Methods in Spatiotemporal Data Mining of Floating Car Trajectory

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030161 ◽

2021 ◽

Vol 10 (3) ◽

pp. 161

Author(s):

Hao-xuan Chen ◽

Fei Tao ◽

Pei-long Ma ◽

Li-na Gao ◽

Tong Zhou

Keyword(s):

Spatial Analysis ◽

Spatial Clustering ◽

Heat Index ◽

Operating Efficiency ◽

Clustering Methods ◽

Clustering Method ◽

Trajectory Mining ◽

Density Peaks ◽

Analysis Methods ◽

Taxi Trajectory

Spatial analysis is an important means of mining floating car trajectory information, and clustering method and density analysis are common methods among them. The choice of the clustering method affects the accuracy and time efficiency of the analysis results. Therefore, clarifying the principles and characteristics of each method is the primary prerequisite for problem solving. Taking four representative spatial analysis methods—KMeans, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Clustering by Fast Search and Find of Density Peaks (CFSFDP), and Kernel Density Estimation (KDE)—as examples, combined with the hotspot spatiotemporal mining problem of taxi trajectory, through quantitative analysis and experimental verification, it is found that DBSCAN and KDE algorithms have strong hotspot discovery capabilities, but the heat regions’ shape of DBSCAN is found to be relatively more robust. DBSCAN and CFSFDP can achieve high spatial accuracy in calculating the entrance and exit position of a Point of Interest (POI). KDE and DBSCAN are more suitable for the classification of heat index. When the dataset scale is similar, KMeans has the highest operating efficiency, while CFSFDP and KDE are inferior. This paper resolves to a certain extent the lack of scientific basis for selecting spatial analysis methods in current research. The conclusions drawn in this paper can provide technical support and act as a reference for the selection of methods to solve the taxi trajectory mining problem.

Download Full-text

A simple clustering technique to extract subsets of data for function approximation

Journal of Hydroinformatics ◽

10.2166/hydro.2015.065 ◽

2015 ◽

Vol 17 (5) ◽

pp. 719-732

Author(s):

Dulakshi Santhusitha Kumari Karunasingha ◽

Shie-Yui Liong

Keyword(s):

Function Approximation ◽

Prediction Models ◽

Data Extraction ◽

Single Parameter ◽

Subtractive Clustering ◽

Data Sets ◽

Clustering Methods ◽

Clustering Method ◽

Data Set ◽

Functional Relationships

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.

Download Full-text