silhouette coefficient
Recently Published Documents


TOTAL DOCUMENTS

84
(FIVE YEARS 64)

H-INDEX

6
(FIVE YEARS 2)

2022 ◽  
Vol 10 (4) ◽  
pp. 583-593
Author(s):  
Syiva Multi Fani ◽  
Rukun Santoso ◽  
Suparti Suparti

Social media is computer-based technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. Twitter is one of the most popular social media in Indonesia which has 78 million users. Businesses rely heavily on Twitter for advertising. Businesses can use these types of tweet content as a means of advertising to Twitter users by Knowing the types of tweet content that are mostly retweeted by their followers . In this study, the application of Text Mining to perform clustering using the K-means clustering method with the best number of clusters obtained from the Silhouette Coefficient method on the @bliblidotcom Twitter tweet data to determine the types of tweet content that are mostly retweeted by @bliblidotcom followers. Tweets with the most retweets and favorites are discount offers and flash sales, so Blibli Indonesia could use this kind of tweet to conduct advertising on social media Twitter because the prize quiz tweets are liked by the @bliblidotcom Twitter account followers.


2022 ◽  
Vol 10 (4) ◽  
pp. 544-553
Author(s):  
Ratna Kurniasari ◽  
Rukun Santoso ◽  
Alan Prahutama

Effective communication between the government and society is essential to achieve good governance. The government makes an effort to provide a means of public complaints through an online aspiration and complaint service called “LaporGub..!”. To group incoming reports easier, the topic of the report is searched by using clustering. Text Mining is used to convert text data into numeric data so that it can be processed further. Clustering is classified as soft clustering (fuzzy) and hard clustering. Hard clustering will divide data into clusters strictly without any overlapping membership with other clusters. Soft clustering can enter data into several clusters with a certain degree of membership value. Different membership values make fuzzy grouping have more natural results than hard clustering because objects at the boundary between several classes are not forced to fully fit into one class but each object is assigned a degree of membership. Fuzzy c-means has an advantage in terms of having a more precise placement of the cluster center compared to other cluster methods, by improving the cluster center repeatedly. The formation of the best number of clusters is seen based on the maximum silhouette coefficient. Wordcloud is used to determine the dominant topic in each cluster. Word cloud is a form of text data visualization. The results show that the maximum silhouette coefficient value for fuzzy c-means clustering is shown by the three clusters. The first cluster produces a word cloud regarding road conditions as many as 449 reports, the second cluster produces a word cloud regarding covid assistance as many as 964 reports, and the third cluster produces a word cloud regarding farmers fertilizers as many as 176 reports. The topic of the report regarding covid assistance is the cluster with the most number of members. 


2022 ◽  
Vol 2 (2) ◽  
pp. 83-89
Author(s):  
Ahmad Harmain ◽  
Paiman Paiman ◽  
Henri Kurniawan ◽  
Kusrini Kusrini ◽  
Dina Maulina

Kawasan indonesia merupakan bagian dari daerah tropis yang memiliki potensi kebakaran sangat tinggi terlebih pada musim kemarau, sehingga perlunya sebuah langkah kongkrit untuk dilakukan mitigasi supaya potensi-potensi kebakaran hutan itu menjadi terminimalisir. Untuk melakukan itu dibutuhkan suatu metode teknologi yang lebih mumpuni dan terbaru untuk memetakan wilayah-wilayah yang mempunyai potensi besar terjadinya kebakaran hutan. Sistem pencitraan dan Informasi dari sistem satelit (MODIS) adalah salah satu informasi tentang kondisi permukaan bumi, yaitu parameter Latitude, Longitude, Brightness, FRP (Fire Radiative Power), dan Confidence dapat dijadikan dasar pengelompokan suatu wilayah memiliki potensi kebakaran atau tidak. K-Means adalah salah satu metode dalam machine learning yang bisa digunakan sebagai salah satu metode dalam pengelompokan wilayah-wilayah tersebut. Akurasi dalam menguji hasil pengelompokan K-Means dapat diuji dengan metode Davies Bouldin Index (DBI) dan Silhouette Coefficient.


2022 ◽  
Author(s):  
Lucio Laureti ◽  
Costantiello Alberto ◽  
Marco Maria Matarrese ◽  
Angelo Leogrande

Abstract In this article we evaluate the determinants of the Employment in Innovative Enterprises in Europe. We use data from the European Innovation Scoreboard of the European Commission for 36 countries in the period 2000-2019 with Panel Data with Fixed Effects, Panel Data with Random Effects, Dynamic Panel, WLS and Pooled OLS. We found that the “Employment in Innovative Enterprises in Europe” is positively associated with “Broadband Penetration in Europe”, “Foreign Controlled Enterprises Share of Value Added”, “Innovation Index”, “Medium and High-Tech Product Exports” and negatively associated to “Basic School Entrepreneurial Education and Training”, “International Co-Publications”, and “Marketing or Organizational Innovators”. Secondly, we perform a cluster analysis with the k-Means algorithm optimized with the Silhouette Coefficient and we found the presence of four different clusters. Finally, we perform a comparison among eight different machine learning algorithms to predict the level of “Employment in Innovative Enterprises” in Europe and we found that the Linear Regression is the best predictor.


2021 ◽  
Vol 4 (2) ◽  
pp. 150-167
Author(s):  
Laurence - - ◽  
Devanny Gumulya ◽  
J. Sandra Sembel ◽  
Magdalena Lestari Ginting

Pariwisata merupakan salah satu kontributor penting dalam menunjang perekonomian suatu negara. Penelitian ini menitikberatkan pada kajian kunjungan wisatawan asing ke Jepang dengan mengambil data jumlah wisatawan yang berkunjung dan jumlah pengeluaran wisatawan untuk kategori akomodasi, hiburan, makanan dan minuman, belanja, transportasi, dan lain-lain. Pada studi yang dilakukan sebelumnya tidak terdapat pengelompokan negara untuk berbagai macam pengeluaran ini, sehingga posisi penelitian ini adalah mengisi kekosongan tersebut dengan melakukan pengelompokan negara berdasarkan pengeluaran turis. Selain itu, tujuan studi ini juga membuat model peramalan dengan menggunakan metode ARIMA yang mengakomodasi tren dan musim. Data yang terdiri dari enam jenis pengeluaran direduksi menjadi 2 dengan nilai variansi yang dijelaskan sebesar 83,84%. Hasil pengolahan data menunjukkan 2 kelompok negara turis berdasarkan pengeluarannya. Dua grup tersebut terdiri dari 8 negara anggota OECD dan 12 negara non OECD. Turis yang berasal dari negara yang tergabung dalam OECD memberi memainkan peranan penting dalam perekonomian dunia dengan kontribusi sebesar 50,5 % dari total pengeluaran turis dunia. Kualitas gugus dikategorikan baik dengan rata-rata koefisien siluet dan nilai kohesi 0,56. Pengelompokan ini dapat digunakan sebagai dasar untuk melakukan studi perilaku konsumen setiap negara. Metode peramalan menggunakan ARIMA dapat digunakan dengan memasukan elemen tren dan musim ke dalam model. Nilai R2 pada model peramalan menunjukan hasil yang baik pada sebagian besar data turis dari 20 negara. Model ARIMA musiman ini dapat dipertimbangkan sebagai model untuk meramalkan jumlah turis yang datang.   Kata kunci: Principal component analysis, k-means clustering, nilai silhouette coefficient and cohesion, ARIMA


2021 ◽  
Vol 12 (1) ◽  
pp. 298
Author(s):  
Abeer Elkhouly ◽  
Allan Melvin Andrew ◽  
Hasliza A. Rahim ◽  
Nidhal Abdulaziz ◽  
Mohamedfareq Abdulmalek ◽  
...  

The current practice of adjusting hearing aids (HA) is tiring and time-consuming for both patients and audiologists. Of hearing-impaired people, 40–50% are not satisfied with their HAs. In addition, good designs of HAs are often avoided since the process of fitting them is exhausting. To improve the fitting process, a machine learning (ML) unsupervised approach is proposed to cluster the pure-tone audiograms (PTA). This work applies the spectral clustering (SP) approach to group audiograms according to their similarity in shape. Different SP approaches are tested for best results and these approaches were evaluated by Silhouette, Calinski-Harabasz, and Davies-Bouldin criteria values. Kutools for Excel add-in is used to generate audiograms’ population, annotated using the results from SP, and different criteria values are used to evaluate population clusters. Finally, these clusters are mapped to a standard set of audiograms used in HA characterization. The results indicated that grouping the data in 8 groups or 10 results in ones with high evaluation criteria. The evaluation for population audiograms clusters shows good performance, as it resulted in a Silhouette coefficient >0.5. This work introduces a new concept to classify audiograms using an ML algorithm according to the audiograms’ similarity in shape.


2021 ◽  
Vol 6 (3) ◽  
pp. 187
Author(s):  
Cepy Sukmayadi ◽  
Aji Primajaya ◽  
Iqbal Maulana

Flood disasters often occur during the rainy season. Karawang is one area that is often flooded. Based on the risk index from BNPB, the flood disaster in Karawang affected 84% of the community, so efforts need to be made to reduce and overcome flood disasters. These problems are the beginning of efforts that need to be known which areas are prone to flooding. Therefore, this study aims to determine flood-prone areas in Karawang as an initial effort in tackling flood disasters. The research was conducted by classifying flood-prone areas using the k-medoids algorithm. K-Medoids uses the partition clustering method to group lists and objects into a number of clusters. This algorithm uses objects in a collection of objects that represent a cluster. The attributes used are flood-causing factors such as rainfall, elevation (soil height), population density, and distance to the river. The results of the study found three potential floods, namely low, medium, and high. There are 1 sub-district with low flood potential, 24 sub-districts with moderate flood potential, and 5 sub-districts with high flood potential. The test results using the silhouette coefficient get a value of 0.370.


2021 ◽  
Vol 1 (1) ◽  
pp. 190-199
Author(s):  
Rijalul Fikri ◽  
Aswin Mushardiyanto ◽  
Mochamad Naufal Laudza’Banin ◽  
Kristiana Maureen ◽  
Harry Patria

Berdasarkan dataset tentang informasi kemiskinan kabupaten/kota tahun 2020 yang dikeluarkan oleh Badan Pusat Statistik Indonesia, dipilih variabel bebas sebanyak dua puluh variabel yang digunakan dalam penelitian ini. Kemudian dilakukan uji korelasi antar variabel bebas tersebut dan diketahui terdapat variabel yang berkorelasi dikategorikan berkorelasi sangat tinggi, dengan nilai korelasi sebesar 0,921 (Persentase Penduduk Miskin - P1 (Poverty Gap Index)) dan 0,964 (P1 (Poverty Gap Index) - P2 (Proverty Severity Index)). Variabel yang memiliki korelasi sangat tinggi jika digunakan akan menyebabkan terjadinya multikolinearitas, sehingga opsi untuk menghilangkan multikolinearitas adalah dengan menggunakan Principal Component Analysis (PCA). Dengan menggunakan Proporsi Kumulatif Varians dan minimum persentase keragaman data sebesar 80% maka didapatkan output berupa dimensi data baru PCA sebanyak tiga dimensi data atau tiga variabel bebas baru. Dengan menggunakan variabel input baru berupa PCA 0, PCA 1 dan PCA 2 dilakukanlah penentuan jumlah cluster dengan metode Silhouette Coefficient dan analisa clustering menggunakan metode K-Means didapatkanlah empat kelompok/cluster, dengan jumlah anggota cluster 1 sebanyak 117 Kabupaten/Kota, cluster 2 sebanyak 154 Kabupaten/Kota, cluster 3 sebanyak 173 Kabupaten/Kota dan cluster 4 sebanyak 70 Kabupaten/Kota.


2021 ◽  
Vol 14 (2) ◽  
pp. 368-372
Author(s):  
Ariel Kristianto

Public transportation is one of the important modes of transportation and is the backbone of transportation in Indonesia. The development of public transportation is also supported by the government, this government support is evident in the national policy, namely the National Medium Term Development Plan (RPJMN). Although public transportation is an effective mode of transportation, it also has obstacles in its development, namely how to meet customer desires in choosing a mode of transportation. There are several variables that are the focus of this research, namely age, gender, income, cost, speed, comfort, safety, efficiency and flexibility. The search for influential variables will use the K-Means and DBSCAN clustering algorithms, these two algorithms are also compared to their performance to find a better algorithm. The results of the Silhouette Coefficient show that DBSCAN has a better performance with a value of 0.99 than K-Means with a value of 0.86. The variables that affect the interest in using public transportation are the most important ones related to cost, speed, comfort, safety, efficiency and flexibility.


2021 ◽  
Vol 1 ◽  
pp. 101-116
Author(s):  
Moch Rizky Wijaya ◽  
Gigih Satriyo Wibowo

Pelanggan internet banking tumbuh sangat cepat. Segmentasi nasabah dapat diterapkan berdasarkan data internet banking. Clustering adalah teknik data mining tanpa pengawasan yang dapat digunakan untuk segmentasi pelanggan. Penelitian ini membangun model clustering pada data profil nasabah bank berdasarkan data kredit sehinagga didapatkan segemntasi nasabah yang nantinya digunakan sebagai landasan keputusan untuk melakukan startegi pemasaran. Metode clustering menggunakan metode K-Means dengan validasi cluster menggunakan metode Silhouette coefficient. Berdasarkan Silhouette coefficient didapatkan nilai terbaik untuk 3 cluster yaitu cluster 0, 1, dan 2. Hasil cluster dengan k-means terbagi menjadi 3 cluster yaitu cluster 0 – rata-rata jumlah kredit lebih rendah, durasi pendek, dan pelanggan usia tua, cluster 1 – rata-rata jumlah kredit tinggi, durasi panjang, dan pelanggan paruh baya atau usai pertengahan, dan cluster 2 - rata-rata jumlah kredit lebih rendah, durasi pendek, dan pelanggan usia muda. Hasil segementasi tersebut dapat dijadikan acuan untuk melakukan startegi pemasaran kedepan.


Sign in / Sign up

Export Citation Format

Share Document