Analisis Pemetaan Tingkat Kriminalitas di Kabupaten Karawang menggunakan Algoritma K-Means

Resti Noor Fahmi; Mohamad Jajuli; Nina Sulistiyowati

doi:10.31539/intecoms.v4i1.2413

Analisis Pemetaan Tingkat Kriminalitas di Kabupaten Karawang menggunakan Algoritma K-Means

INTECOMS Journal of Information Technology and Computer Science ◽

10.31539/intecoms.v4i1.2413 ◽

2021 ◽

Vol 4 (1) ◽

pp. 67-79

Author(s):

Resti Noor Fahmi ◽

Mohamad Jajuli ◽

Nina Sulistiyowati

Keyword(s):

Silhouette Coefficient

Kriminalitas merupakan salah satu permasalahan yang sering terjadi di masyarakat yang perlu diperhatikan karena merugikan dan menimbulkan dampak negatif kepada masyarakat. Dilansir dari jabar.tribunews.com Kabupaten Karawang menjadi ranking pertama tingkat kriminalitas tertinggi di Jawa Barat pada awal masa pandemi. Ini menjadi PR pemerintah dan Polres Karawang khususnya untuk dapat menangani dan mengupayakan penanggulangan kriminalitas di Karawang. Penelitian ini menggunakan metode clustering dengan algoritma k-means dan dilakukan pemetaan daerah rawan kriminalitas menggunakan QGIS. Hasil pengelompokan daerah rawan kriminalitas di Karawang pada 2019 didapatkan cluster tidak rawan sebanyak 23 kecamatan, cluster rawan sebanyak 3 kecamatan dan cluster sangat banyak sebanyak 4 kecamatan. Sedangkan pada 2020 didapatkan cluster tidak rawan sebanyak 22 kecamatan, cluster rawan sebanyak 4 kecamatan, dan cluster sangat rawan sebanyak 4 kecamatan. Hasil evaluasi clustering menggunakan silhouette coefficient pada tahun 2019 yaitu sebesar 0,52 dan 0,54 pada tahun 2020, keduanya masuk dalam kategori medium strucutre dengan interpretasi penempatan klaster yang wajar

Download Full-text

Optimization of SV-kNNC using Silhouette Coefficient and LMKNN for Stock Price Prediction

2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) ◽

10.1109/isriti51436.2020.9315516 ◽

2020 ◽

Author(s):

Frans Mikael Sinaga ◽

Pahala Sirait ◽

Arwin Halim

Keyword(s):

Stock Price ◽

Stock Price Prediction ◽

Price Prediction ◽

Silhouette Coefficient

Download Full-text

PENERAPAN TEXT MINING UNTUK MELAKUKAN CLUSTERING DATA TWEET AKUN BLIBLI PADA MEDIA SOSIAL TWITTER MENGGUNAKAN K-MEANS CLUSTERING

Jurnal Gaussian ◽

10.14710/j.gauss.v10i4.30409 ◽

2022 ◽

Vol 10 (4) ◽

pp. 583-593

Author(s):

Syiva Multi Fani ◽

Rukun Santoso ◽

Suparti Suparti

Keyword(s):

Social Media ◽

Text Mining ◽

Virtual Networks ◽

Number Of Clusters ◽

Silhouette Coefficient ◽

Twitter Account ◽

Computer Based ◽

Twitter Users ◽

Clustering Data ◽

Coefficient Method

Social media is computer-based technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. Twitter is one of the most popular social media in Indonesia which has 78 million users. Businesses rely heavily on Twitter for advertising. Businesses can use these types of tweet content as a means of advertising to Twitter users by Knowing the types of tweet content that are mostly retweeted by their followers . In this study, the application of Text Mining to perform clustering using the K-means clustering method with the best number of clusters obtained from the Silhouette Coefficient method on the @bliblidotcom Twitter tweet data to determine the types of tweet content that are mostly retweeted by @bliblidotcom followers. Tweets with the most retweets and favorites are discount offers and flash sales, so Blibli Indonesia could use this kind of tweet to conduct advertising on social media Twitter because the prize quiz tweets are liked by the @bliblidotcom Twitter account followers.

Download Full-text

Penerapan Metode Silhouette Coefficient untuk Evaluasi Clustering Obat

PENA TEKNIK: Jurnal Ilmiah Ilmu-Ilmu Teknik ◽

10.51557/pt_jiit.v6i2.659 ◽

2021 ◽

Vol 6 (2) ◽

pp. 48

Author(s):

Solmin Paembonan ◽

Hisma Abduh

Keyword(s):

Data Set ◽

Number Of Clusters ◽

Silhouette Coefficient ◽

Data Group

Dalam penelitian ini menggunakan metode k-means, metode ini dapat digunakan untuk menjadikan beberapa obat yang mirip menjadi suatu kelompok data tertentu. Salah satu cara untuk mengetahui tingkat kemiripan data adalah melalui perhitungan jarak antar data. Semakain kecil jarak antar data semakin tinggi tingkat kemiripan data tersebut dan sebaliknya semakin besar jarak antar data maka semakin rendah tingkat kemiripannya. Tujuan akhir clustering adalah untuk menentukan kelompok dalam sekumpulan data yang tidak berlabel, karena clustering merupakan suatu metode unsupervised dan tidak terdapat suatu kondisi awal untuk sejumlah cluster yang mungkin terbentuk dalam sekumpulan data, maka dibutuhkan suatu evaluasi hasil clustering. Berdasarkan evaluasi yang dilakukan terhadap hasil clustering dengan nilai dari silhouette coeficient = 0,4854. In this study using the k-means method, this method can be used to make several similar drugs into a certain data group. One way to determine the level of similarity of the data is through the calculation of the distance between the data. The smaller the distance between the data, the higher the level of similarity between the data and vice versa, the greater the distance between the data, the lower the similarity level. For a number of clusters that may be formed in a data set, an evaluation of the results of clustering is needed. Based on the evaluation carried out on the results of clustering with the value of the silhouette coefficient = 0.4854.

Download Full-text

Wind Turbine Anomaly Detection Based on SCADA Data Mining

Electronics ◽

10.3390/electronics9050751 ◽

2020 ◽

Vol 9 (5) ◽

pp. 751

Author(s):

Xiaoyuan Liu ◽

Senxiang Lu ◽

Yan Ren ◽

Zhenning Wu

Keyword(s):

Anomaly Detection ◽

Wind Turbine ◽

Supervisory Control ◽

Detection Method ◽

Classification Result ◽

Detection Model ◽

Scada System ◽

Silhouette Coefficient ◽

Abnormal State ◽

Low Dimensional

In this paper, a wind turbine anomaly detection method based on a generalized feature extraction is proposed. Firstly, wind turbine (WT) attributes collected from the Supervisory Control And Data Acquisition (SCADA) system are clustered with k-means, and the Silhouette Coefficient (SC) is adopted to judge the effectiveness of clustering. Correlation between attributes within a class becomes larger, correlation between classes becomes smaller by clustering. Then, dimensions of attributes within classes are reduced based on t-Distributed-Stochastic Neighbor Embedding (t-SNE) so that the low-dimensional attributes can be more full and more concise in reflecting the WT attributes. Finally, the detection model is trained and the normal or abnormal state is detected by the classification result 0 or 1 respectively. Experiments consists of three cases with SCADA data demonstrate the effectiveness of the proposed method.

Download Full-text

Segmentasi Buah Mangga Menggunakan MLE dan GMM Sebagai Klasterisasi Pixel

Jurnal Algoritme ◽

10.35957/algoritme.v1i1.435 ◽

2020 ◽

Vol 1 (1) ◽

pp. 57-67

Author(s):

Steven Pranata ◽

Derry Alamsyah

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Error Rate ◽

Data Distribution ◽

Likelihood Estimation ◽

Clustering Method ◽

Number Of Clusters ◽

Silhouette Coefficient ◽

Good Cluster ◽

Cluster Quality

Segmentation divides an image into parts or segments that are simpler and more meaningful so they can be analyzed further. The solution that has been found is using the Maximum Likelihood Estimation (MLE) method and the Gausian Mixture Model. GMM is a clustering method. GMM is a function consisting of several Gaussian, each identified by k ∈ {1, ..., K}, where K is the number of clusters in our dataset. Maximum Likelihood estimation is a technique used to find a certain point to maximize a function, this technique is very widely used in estimating a data distribution parameter. Tests carried out using mango images with 10 different backgrounds. GMM will cluster the pixels of the mango image to produce averages and covariates. Then the average and covariance will be used by MLE to qualify each pixel of the mango image. In this study GMM and MLE tests were carried out to segment mangoes. Based on the results obtained, the GMM and MLE methods have an error rate of 13.07% for 3 clusters, 8.06% for 4 clusters, and 6.63% for 5 clusters and good cluster quality with silhouette coefficient values of 0.37686 for 3 clusters, 0.29577 for 4 clusters, and 0.26162 for 5 clusters.

Download Full-text

Prospective directions of state support of the national innovation system of Russia

SHS Web of Conferences ◽

10.1051/shsconf/202112804009 ◽

2021 ◽

Vol 128 ◽

pp. 04009

Author(s):

Dmitry Serpuhovitin

Keyword(s):

Human Development Index ◽

Spatial Clustering ◽

Innovation System ◽

National Innovation System ◽

Clustering Methods ◽

State Support ◽

Quality Metric ◽

Silhouette Coefficient ◽

Global Innovation ◽

Python Programming

The article presents an original methodology for selecting the most popular measures of state support of the national innovation system of a country with usage of numerical methods of clustering. The clustering methodology is based on a combination of indexes: Global Innovation Index, Gross Natural Income and Human Development Index. In the lists of countries and their corresponding clusters obtained as a result of empirical analysis, the most demanded measures of state support of the national innovation system were identified on the base of retrospective dynamics of Global Innovation Index indicators characterizing the state support of the national innovation system. For the obtained indicators of the Global Innovation Index, recommendations were given for the direction of development of the national innovation system of Russia. Classical clustering methods were used as analysis instruments: Density-based spatial clustering of applications with noise and K-Means, The Silhouette Coefficient implemented in the sklearn library of Python programming language was used as a quality metric.

Download Full-text

Pemodelan Cluster Loyalitas Customer Menggunakan Algoritma K-Means Dengan Parameter LRIFMQ

Jurnal INFORM ◽

10.25139/inform.v0i1.2691 ◽

2020 ◽

Vol 5 (2) ◽

pp. 54

Author(s):

Aloysius Matz Teguh Utomo

Keyword(s):

Calculated Data ◽

Optimal Number ◽

Customer Segmentation ◽

Number Of Clusters ◽

Silhouette Coefficient ◽

Customer Lifetime ◽

One Year ◽

The Right ◽

Hierarchy Process ◽

A Company

Loyal customers are one of the factors that determine the development of a business. Therefore, businesses need a strategy to keep customers loyal, even making customers who were previously less loyal to become more loyal. The strategy used must be right on target according to customer segmentation. The purpose of this paper is to model a cluster of customer loyalty to help businesses in making the right decisions of marketing strategy. Segmentation is done using the k-means algorithm with LRIFMQ (length, recency, interval, frequency, monetary, quantity) as parameters, and the CLV (customer lifetime value) of each cluster is calculated. Data obtained from PT. XYZ (a company engaged in food processing) for one year (1 January 2019 - 31 December 2019), with 337.739 transactions, and 26.683 customers. AHP (analytical hierarchy process) method is used for LRIFMQ weighting because this method has a consistency index calculation. The silhouette coefficient is used to calculate the cluster quality and determine the optimal number of clusters. The best results are obtained with the silhouette coefficient value of 0,632904 with the number of clusters 6.

Download Full-text

Improve Hybrid Particle Swarm Optimization and K-Means for Clustering

Journal of Information Technology and Computer Science ◽

10.25126/jitecs.20194183 ◽

2019 ◽

Vol 4 (1) ◽

pp. 42

Author(s):

Yudha Alif Auliya ◽

Wayan Firdaus Mahmudy ◽

Sudarto Sudarto

Keyword(s):

Particle Swarm Optimization ◽

Climatic Factors ◽

Potato Crop ◽

Particle Swarm ◽

Potato Production ◽

Land Suitability ◽

Swarm Optimization ◽

Hybrid Particle ◽

Hybrid Particle Swarm Optimization ◽

Silhouette Coefficient

Abstract. Potato production is strongly influenced by the selection of suitable land for crops. Criteria for land suitability of planting potatoes is influenced by climatic factors and land characteristics. planted area clustered based on 11 criteria land suitability. The clustering results in the form of four clusters, namely: very suitable (S1), appropriate (S2), is quite suitable (S3) and are not suitable (N). Clustering of land aims to improve the quality and quantity of the potato crop. Clustering is done using a hybrid Particle Swarm Optimization with K-Means (KCPSO). The hybrid method is used to obtain an accurate result cluster. In this study used a new approach to doing improve KCPSO with random injection method. The calculation of the value of cost based on the silhouette coefficient. The results obtained KCPSO showed better results when compared to using the K-Means algorithm without hybrid. The calculation result KCPSO get the best centroid indicated by the value of the largest Silhouette coefficient.Keywords: Clustering, K-Means, Particle Swarm Optimization, random injection, silhouette Coefficient.

Download Full-text

Application of Density Based Clustering of Disaster Location in Realtime Social Media

TEM Journal ◽

10.18421/tem93-13 ◽

2020 ◽

pp. 929-936

Author(s):

Mochammad Haldi Widianto ◽

Ivan Diryana Sudirman ◽

Muhammad Hanif Awaluddin

Keyword(s):

Social Media ◽

Natural Disasters ◽

Clustering Methods ◽

Silhouette Coefficient ◽

Density Based Clustering ◽

Data Density ◽

Disaster Data ◽

User Data ◽

High Degree ◽

Degree Of Similarity

Online life is used as a method of finding information, one of which is Twitter as the medium. The occurrence of natural disasters is very detrimental. Therefore, the application is needed to see natural disasters through social media Twitter. A small number of studies using clustering methods based on Twitter user data density are the beginning of this research. With the availability of data in certain areas makes it easy to group. After that, the data is grouped based on a high degree of similarity. One result of applying this method is the location of the disaster. NER-based rules are used to discover out the area of the disaster. Data accuracy testing is performed using the Silhouette coefficient.

Download Full-text

Analisis Peramalan dan Pengelompokan Jumlah Turis ke Jepang

Journal of Integrated System ◽

10.28932/jis.v4i2.3164 ◽

2021 ◽

Vol 4 (2) ◽

pp. 150-167

Author(s):

Laurence - - ◽

Devanny Gumulya ◽

J. Sandra Sembel ◽

Magdalena Lestari Ginting

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Silhouette Coefficient

Pariwisata merupakan salah satu kontributor penting dalam menunjang perekonomian suatu negara. Penelitian ini menitikberatkan pada kajian kunjungan wisatawan asing ke Jepang dengan mengambil data jumlah wisatawan yang berkunjung dan jumlah pengeluaran wisatawan untuk kategori akomodasi, hiburan, makanan dan minuman, belanja, transportasi, dan lain-lain. Pada studi yang dilakukan sebelumnya tidak terdapat pengelompokan negara untuk berbagai macam pengeluaran ini, sehingga posisi penelitian ini adalah mengisi kekosongan tersebut dengan melakukan pengelompokan negara berdasarkan pengeluaran turis. Selain itu, tujuan studi ini juga membuat model peramalan dengan menggunakan metode ARIMA yang mengakomodasi tren dan musim. Data yang terdiri dari enam jenis pengeluaran direduksi menjadi 2 dengan nilai variansi yang dijelaskan sebesar 83,84%. Hasil pengolahan data menunjukkan 2 kelompok negara turis berdasarkan pengeluarannya. Dua grup tersebut terdiri dari 8 negara anggota OECD dan 12 negara non OECD. Turis yang berasal dari negara yang tergabung dalam OECD memberi memainkan peranan penting dalam perekonomian dunia dengan kontribusi sebesar 50,5 % dari total pengeluaran turis dunia. Kualitas gugus dikategorikan baik dengan rata-rata koefisien siluet dan nilai kohesi 0,56. Pengelompokan ini dapat digunakan sebagai dasar untuk melakukan studi perilaku konsumen setiap negara. Metode peramalan menggunakan ARIMA dapat digunakan dengan memasukan elemen tren dan musim ke dalam model. Nilai R2 pada model peramalan menunjukan hasil yang baik pada sebagian besar data turis dari 20 negara. Model ARIMA musiman ini dapat dipertimbangkan sebagai model untuk meramalkan jumlah turis yang datang. Kata kunci: Principal component analysis, k-means clustering, nilai silhouette coefficient and cohesion, ARIMA

Download Full-text