PENERAPAN METODE K-MEANS DENGAN METODE ELBOW UNTUK SEGMENTASI PELANGGAN MENGGUNAKAN MODEL RFM(Recency, Frequency, & Monetary)

Segmentasi pelanggan pada perusahaan merupakan tindakan yang dapat mempermudah perusahaan dalam mengambil keputusan ke depan. Pada penelitian ini data yang digunakan berasal dari perusahaan otomotif, PT Hasjrat Abadi Ambon. Data yang dipakai terdiri dari data transaksi dan pelanggan kendaraan bermotor. Penerapan model RFM dapat mengelompokkan pelanggan-pelanggan berdasarkan nilai variabel Recency, Frequency dan Monetary. Hasil dari model RFM akan memperoleh status baru pada tiap pelanggan dari skala terbaik sampai terburuk. Pelanggan yang telah memiliki status akan dikelompokkan menggunakan metode K-Means menjadi beberapa Cluster(kelompok). Dalam menentukan jumlah Cluster yang optimal maka diterapkan metode Elbow. Algoritma yang digunakan dalam pembentukan Cluster terdiri dari Euclidean Distance dan Manhattan Distance. Kedua algoritma akan dibandingkan kualitas pembentukan Clusternya menggunakan metode Silhoutte Coefficient. Hasil yang diberikan pada penelitian ini berupa data yang terbagi atas 5 kelompok dengan dilakukannya lima kali pengujian untuk menentukan centroid yang unggul. Cluster yang unggul akan dibuatkan visualisasi datanya untuk memudahkan perusahaan dalam mengambil keputusan. Berdasarkan penerapan Silhoutte Coefficient, algoritma yang lebih unggul yaitu Manhattan Distance dengan nilai s(i) sebesar 0.152695. Customer segmentation at the company is an action that can facilitate the company in making decisions going forward. In this study the data used came from an automotive company, PT Hasjrat Abadi Ambon. The data used consists of transaction data and motor vehicle customers. The application of the RFM model can classify customers based on the value of the Recency, Frequency and Monetary variables. The results of the RFM model will obtain a new status on each customer from the best to the worst scale. Customers who already have status will be grouped using the K-Means method into several Clusters (groups). In determining the optimal number of Clusters, the Elbow method is applied. The algorithm used in Cluster formation consists of Euclidean Distance and Manhattan Distance. The two algorithms will be compared the quality of the Cluster formation using the Silhoutte Coefficient method. The results given in this study are in the form of data divided into 5 groups by conducting five tests to determine superior centroids. Excellent clusters will be made of data visualization to facilitate the company in making decisions. Based on the application of Silhoutte Coefficient, a superior algorithm is Manhattan Distance with value s(i) : 0.152695.

Download Full-text

ANALISIS SEGMENTASI PELANGGAN MENGGUNAKAN KOMBINASI RFM MODEL DAN TEKNIK CLUSTERING

Jurnal Terapan Teknologi Informasi ◽

10.21460/jutei.2018.21.76 ◽

2018 ◽

Vol 2 (1) ◽

pp. 23-32 ◽

Cited By ~ 2

Author(s):

Beta Estri Adiana ◽

Indah Soesanti ◽

Adhistya Erna Permanasari

Keyword(s):

Small And Medium Enterprises ◽

Cluster Formation ◽

Optimal Number ◽

Customer Segmentation ◽

Marketing Strategies ◽

Monetary Analysis ◽

Data Mining Approach ◽

Rfm Model ◽

Customer Services ◽

Medium Enterprises

Intense competition in the business field motivates a small and medium enterprises (SMEs) to manage customer services to the maximal. Improve of customer royalty by grouping cunstomers into some of groups and determining appropriate and effective marketing strategies for each group. Customer segmentation can be performed by data mining approach with clustering method. The main purpose of this paper is customer segmentation and measure their loyalty to a SME’s product. Using CRISP-DM method which consist of six phases, namely business understanding, data understanding, data preparatuin, modeling, evaluation and deployment. The K-Means algorithm is used for cluster formation and RapidMiner as a tool used to evaluate the result of clusters. Cluster formation is based on RFM (recency, frequency, monetary) analysis. Davies Bouldin Index (DBI) is used to find the optimal number of clusters (k). The customers are divided into 3 clusters, total of customer in first cluster is 30 customers who entered in typical customer category, the second cluster there are 8 customer whho entered in superstar customer and 89 customers in third cluster is dormant cluster category.

Download Full-text

Customer segmentation using bisecting k-means algorithm based on recency, frequency, and monetary (RFM) model

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.8.2.2020.78-83 ◽

2019 ◽

Vol 8 (2) ◽

pp. 78-83

Author(s):

Novianti Puspitasari ◽

Joan Angelina Widians ◽

Noval Bayu Setiawan

Keyword(s):

Customer Loyalty ◽

Customer Segmentation ◽

Number Of Clusters ◽

Transaction Data ◽

Model Based ◽

Silhouette Coefficient ◽

Rfm Model ◽

A Company ◽

Coefficient Method

Information on customer loyalty characteristics in a company is needed to improve service to customers. A customer segmentation model based on transaction data can provide this information. This study used parameters from the recency, frequency, and monetary (RFM) model in determining customer segmentation and bisecting k-means algorithm to determine the number of clusters. The dataset used 588 sales transactions for PT Dinar Energi Utama in 2017. The clusters formed by the bisecting k-means and k-means algorithm were tested using the silhouette coefficient method. The bisecting k-means algorithm can form the best customer segmentation into three groups, namely Occasional, Typical, and Gold, with a silhouette coefficient of 0.58132.

Download Full-text

Pengelompokan Wilayah Madura Berdasar Indikator Pemerataan Pendidikan Menggunakan Partition Around Medoids Dan Validasi Adjusted Random Index

Journal of Information Systems Engineering and Business Intelligence ◽

10.20473/jisebi.1.1.17-24 ◽

2015 ◽

Vol 1 (1) ◽

pp. 17 ◽

Cited By ~ 1

Author(s):

Budi Dwi Satoto ◽

Bain Khusnul Khotimah ◽

Iswati Iswati

Keyword(s):

Data Mining ◽

Euclidean Distance ◽

Distance Measure ◽

Adjusted Rand Index ◽

Manhattan Distance ◽

Random Index ◽

Original Label ◽

Long Time ◽

Canberra Distance

Abstrak— Pemerataan pendidikan di Indonesia telah menjadi perhatian pemerintah sejak lama. Namun hingga saat ini, pendidikan di Indonesia masih belum merata. Hal tersebut dapat dilihat dari rendahnya nilai Angka Partisisipasi Kasar (APK) dan Angka Partisipasi Murni (APM) di daerah-daerah tertentu serta belum meratanya sarana dan prasarana pendidikan. Adapun tujuan penelitian ini adalah memberikan informasi kepada pemerintah setempat mengenai kondisi pendidikan di wilayahnya sehingga dapat menghasilkan kebijakan yang tepat mengenai pengembangan infrastuktur pendidikan dan distribusi guru bantu. Clustering adalah metode data mining yang membagi data kedalam kelompok yang mempunyai objek yang karakteristik sama. Penelitian ini menggunakan metode clustering Partition Around Medoids (PAM) dengan 3 distance measure: Manhattan, Euclidean dan Canberra distance. Untuk mengukur kualitas hasil clustering, digunakan nilai Adjusted Rand Index (ARI). Semakin besar nilai ARI, semakin baik kualitas cluster. Dari 3 kali ujicoba diperoleh rata-rata nilai ARI untuk Euclidean distance sebesar 0.799, Manhattan distance dengan rata-rata sebesar 0.738 dan Canberra distance sebesar 0.163. Sedangkan pengelompokan terbaik diperoleh menggunakan Euclidean distance dengan nilai ARI sebesar 0.825 dan kecocokan dengan label asli sebesar 83.33%. Dari pengelompokan terbaik menghasilkan kelompok pemerataan tinggi terdiri dari 11 kecamatan, kelompok pemerataan sedang terdiri dari 15 kecamatan dan kelompok pemerataan rendah terdiri dari 46 kecamatan. Kata Kunci— indikator pemerataan pendidikan, clustering, Partition Around Medoid, distance measure, Adjusted Random IndexAbstract—Distribution of education in Indonesia has become government's attention for a long time. But until now, education in Indonesia is still not evenly distributed. This can be seen from the low value of Participation Rough figures and net enrollment ratio in certain areas as well as uneven educational facilities. The purpose of this research is to provide information to local authorities about the state of education in local region to produce an appropriate policy regarding development of educational infrastructure and teachers assistant distribution. Clustering is a data mining method that divides data into several groups with the same object characteristics. This research used Partition Around Medoids methods with 3 distance measure that contain Manhattan, Euclidean and Canberra distance. Adjusted Random Index used to measure the quality of clustering results. From 3 times sampling, better value of ARI Euclidean distance 0.799, Manhattan distance 0.738 and Canberra distance 0.163 while the best clustering obtained is Euclidean distance with value of ARI 0.825 and compatibility with the original label 83.33%. it is produces high equity group composed of 11 districts with equity groups are composed of 15 districts and low equity group consists of 46 sub-districts. Keywords—Indicator of Educational Equity, Clustering, Partition Around Medoid, Distance Measure, Adjusted Random Index .

Download Full-text

The Number of Topics Optimization: Clustering Approach

Machine Learning and Knowledge Extraction ◽

10.3390/make1010025 ◽

2019 ◽

Vol 1 (1) ◽

pp. 416-426 ◽

Cited By ~ 4

Author(s):

Fedor Krasnov ◽

Anastasiia Sen

Keyword(s):

Euclidean Distance ◽

Topic Model ◽

Optimal Number ◽

Silhouette Coefficient ◽

Cosine Measure ◽

Small Collection ◽

Clustering Approach ◽

The Subject ◽

Better Than

Although topic models have been used to build clusters of documents for more than ten years, there is still a problem of choosing the optimal number of topics. The authors analyzed many fundamental studies undertaken on the subject in recent years. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of the topic model. The authors analyzed the internal metrics of the topic model: coherence, contrast, and purity to determine the optimal number of topics and concluded that they are not applicable to solve this problem. The authors analyzed the approach to choosing the optimal number of topics based on the quality of the clusters. For this purpose, the authors considered the behavior of the cluster validation metrics: the Davies Bouldin index, the silhouette coefficient, and the Calinski-Harabaz index. A new method for determining the optimal number of topics proposed in this paper is based on the following principles: (1) Setting up a topic model with additive regularization (ARTM) to separate noise topics; (2) Using dense vector representation (GloVe, FastText, Word2Vec); (3) Using a cosine measure for the distance in cluster metric that works better than Euclidean distance on vectors with large dimensions. The methodology developed by the authors for obtaining the optimal number of topics was tested on the collection of scientific articles from the OnePetro library, selected by specific themes. The experiment showed that the method proposed by the authors allows assessing the optimal number of topics for the topic model built on a small collection of English documents.

Download Full-text

Penerapan Agglomerative Hierarchical Clustering Untuk Segmentasi Pelanggan

Jurnal Ilmiah SINUS ◽

10.30646/sinus.v18i1.448 ◽

2020 ◽

Vol 18 (1) ◽

pp. 75

Author(s):

Widyawati Widyawati ◽

Wawan Laksito Yuly Saptomo ◽

Yustina Retno Wahyu Utami

Keyword(s):

Hierarchical Clustering ◽

Cluster Formation ◽

Customer Segmentation ◽

Marketing Strategies ◽

Manhattan Distance ◽

Agglomerative Hierarchical Clustering ◽

Customer Group ◽

Silhouette Coefficient ◽

Customer Type ◽

The Right

As more businesses emerge, companies need to have the right marketing strategy to provide the best service to customers. The first step is to know the type of customer and make appropriate marketing strategies according to the type of customer. In this research, it is proposed for clustering customers so that an appropriate strategy for that customer group can be determined. The method used for cluster formation uses Agglomerative Hierarchical Clustering with Average Linkage approach and distance determination using Manhattan Distance. The variables in this research are Recency, Frequency, and Monetary (RFM). The results of testing using the Silhouette coefficient show that the results of 7 clusters are the best results when compared with 2 clusters up to 20 clusters because they have the smallest minus value. Based on the results of the Silhoutte coefficient, customer segmentation uses 7 clusters with each cluster representing the existing customer type.

Download Full-text

The system of indicators of the quality of the carbon-carbon composite materials and technologies of their production

Informacionno-technologicheskij vestnik ◽

10.21499/2409-1650-2018-3-127-132 ◽

2018 ◽

pp. 127-132

Author(s):

T. N. Antipova ◽

D. S. Shiroyan

Keyword(s):

Composite Material ◽

Low Cost ◽

Experimental Studies ◽

Carbon Matrix ◽

Carbon Composite ◽

Optimal Number ◽

Laboratory Equipment ◽

Carbon Composite Material ◽

Carbon Composite Materials

The system of indicators of quality of carbon-carbon composite material and technological operations of its production is proved in the work. As a result of the experimental studies, with respect to the existing laboratory equipment, the optimal number of cycles of saturation of the reinforcing frame with a carbon matrix is determined. It was found that to obtain a carbon-carbon composite material with a low cost and the required quality indicators, it is necessary to introduce additional parameters of the pitch melt at the impregnation stage.

Download Full-text

Comparison of Two Intergranular Corrosion Tests on EN AW-6016 Sheet Material

Applied Sciences ◽

10.3390/app11115294 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5294

Author(s):

Peer Decker ◽

Ines Zerbin ◽

Luisa Marzoli ◽

Marcel Rosefort

Keyword(s):

Optical Microscopy ◽

Intergranular Corrosion ◽

Sheet Material ◽

Test Methods ◽

Standard Test ◽

Corrosion Depth ◽

Corrosion Tests ◽

Automotive Company ◽

Test Parameters

Two different intergranular corrosion tests were performed on EN AW-6016 sheet material, an ISO 11846:1995-based test with varying solution amounts and acid concentrations, and a standard test of an automotive company (PV1113, VW-Audi). The average intergranular corrosion depth was determined via optical microscopy. The differences in the intergranular corrosion depths were then discussed with regard to the applicability and quality of the two different test methods. The influence of varying test parameters for ISO 11846:1995 was discussed as well. The determined IGC depths were found to be strongly dependent on the testing parameters, which will therefore have a pronounced influence on the determined IGC susceptibility of a material. In general, ISO 11846:1995 tests resulted in a significantly lower corrosion speed, and the corrosive attack was found to be primarily along grain boundaries.

Download Full-text

Linkage analysis using geographical proximity: a test of the efficacy of distance measures

Journal of Criminological Research Policy and Practice ◽

10.1108/jcrpp-01-2020-0006 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Shumpei Haginoya ◽

Aiko Hanayama ◽

Tamae Koike

Keyword(s):

Environmental Factors ◽

Euclidean Distance ◽

Distance Measure ◽

Distance Measures ◽

Manhattan Distance ◽

Geographical Proximity ◽

Discrimination Accuracy ◽

Content Type ◽

Shortest Route ◽

The Impact

Purpose The purpose of this paper was to compare the accuracy of linking crimes using geographical proximity between three distance measures: Euclidean (distance measured by the length of a straight line between two locations), Manhattan (distance obtained by summing north-south distance and east-west distance) and the shortest route distances. Design/methodology/approach A total of 194 cases committed by 97 serial residential burglars in Aomori Prefecture in Japan between 2004 and 2015 were used in the present study. The Mann–Whitney U test was used to compare linked (two offenses committed by the same offender) and unlinked (two offenses committed by different offenders) pairs for each distance measure. Discrimination accuracy between linked and unlinked crime pairs was evaluated using area under the receiver operating characteristic curve (AUC). Findings The Mann–Whitney U test showed that the distances of the linked pairs were significantly shorter than those of the unlinked pairs for all distance measures. Comparison of the AUCs showed that the shortest route distance achieved significantly higher accuracy compared with the Euclidean distance, whereas there was no significant difference between the Euclidean and the Manhattan distance or between the Manhattan and the shortest route distance. These findings give partial support to the idea that distance measures taking the impact of environmental factors into consideration might be able to identify a crime series more accurately than Euclidean distances. Research limitations/implications Although the results suggested a difference between the Euclidean and the shortest route distance, it was small, and all distance measures resulted in outstanding AUC values, probably because of the ceiling effects. Further investigation that makes the same comparison in a narrower area is needed to avoid this potential inflation of discrimination accuracy. Practical implications The shortest route distance might contribute to improving the accuracy of crime linkage based on geographical proximity. However, further investigation is needed to recommend using the shortest route distance in practice. Given that the targeted area in the present study was relatively large, the findings may contribute especially to improve the accuracy of proactive comparative case analysis for estimating the whole picture of the distribution of serial crimes in the region by selecting more effective distance measure. Social implications Implications to improve the accuracy in linking crimes may contribute to assisting crime investigations and the earlier arrest of offenders. Originality/value The results of the present study provide an initial indication of the efficacy of using distance measures taking environmental factors into account.

Download Full-text