silhouette coefficient Latest Research Papers

PENERAPAN TEXT MINING UNTUK MELAKUKAN CLUSTERING DATA TWEET AKUN BLIBLI PADA MEDIA SOSIAL TWITTER MENGGUNAKAN K-MEANS CLUSTERING

Jurnal Gaussian ◽

10.14710/j.gauss.v10i4.30409 ◽

2022 ◽

Vol 10 (4) ◽

pp. 583-593

Author(s):

Syiva Multi Fani ◽

Rukun Santoso ◽

Suparti Suparti

Keyword(s):

Social Media ◽

Text Mining ◽

Virtual Networks ◽

Number Of Clusters ◽

Silhouette Coefficient ◽

Twitter Account ◽

Computer Based ◽

Twitter Users ◽

Clustering Data ◽

Coefficient Method

Social media is computer-based technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. Twitter is one of the most popular social media in Indonesia which has 78 million users. Businesses rely heavily on Twitter for advertising. Businesses can use these types of tweet content as a means of advertising to Twitter users by Knowing the types of tweet content that are mostly retweeted by their followers . In this study, the application of Text Mining to perform clustering using the K-means clustering method with the best number of clusters obtained from the Silhouette Coefficient method on the @bliblidotcom Twitter tweet data to determine the types of tweet content that are mostly retweeted by @bliblidotcom followers. Tweets with the most retweets and favorites are discount offers and flash sales, so Blibli Indonesia could use this kind of tweet to conduct advertising on social media Twitter because the prize quiz tweets are liked by the @bliblidotcom Twitter account followers.

ANALISIS KECENDERUNGAN LAPORAN MASYARAKAT PADA “LAPORGUB..!” PROVINSI JAWA TENGAH MENGGUNAKAN TEXT MINING DENGAN FUZZY C-MEANS CLUSTERING

Jurnal Gaussian ◽

10.14710/j.gauss.v10i4.33101 ◽

2022 ◽

Vol 10 (4) ◽

pp. 544-553

Author(s):

Ratna Kurniasari ◽

Rukun Santoso ◽

Alan Prahutama

Keyword(s):

Text Mining ◽

Cluster Center ◽

Text Data ◽

Fuzzy C Means ◽

Word Cloud ◽

Silhouette Coefficient ◽

Degree Of Membership ◽

Fuzzy C Means Clustering ◽

Hard Clustering ◽

The Government

Effective communication between the government and society is essential to achieve good governance. The government makes an effort to provide a means of public complaints through an online aspiration and complaint service called “LaporGub..!”. To group incoming reports easier, the topic of the report is searched by using clustering. Text Mining is used to convert text data into numeric data so that it can be processed further. Clustering is classified as soft clustering (fuzzy) and hard clustering. Hard clustering will divide data into clusters strictly without any overlapping membership with other clusters. Soft clustering can enter data into several clusters with a certain degree of membership value. Different membership values make fuzzy grouping have more natural results than hard clustering because objects at the boundary between several classes are not forced to fully fit into one class but each object is assigned a degree of membership. Fuzzy c-means has an advantage in terms of having a more precise placement of the cluster center compared to other cluster methods, by improving the cluster center repeatedly. The formation of the best number of clusters is seen based on the maximum silhouette coefficient. Wordcloud is used to determine the dominant topic in each cluster. Word cloud is a form of text data visualization. The results show that the maximum silhouette coefficient value for fuzzy c-means clustering is shown by the three clusters. The first cluster produces a word cloud regarding road conditions as many as 449 reports, the second cluster produces a word cloud regarding covid assistance as many as 964 reports, and the third cluster produces a word cloud regarding farmers fertilizers as many as 176 reports. The topic of the report regarding covid assistance is the cluster with the most number of members.

NORMALISASI DATA UNTUK EFISIENSI K-MEANS PADA PENGELOMPOKAN WILAYAH BERPOTENSI KEBAKARAN HUTAN DAN LAHAN BERDASARKAN SEBARAN TITIK PANAS

TEKNIMEDIA: Teknologi Informasi dan Multimedia ◽

10.46764/teknimedia.v2i2.49 ◽

2022 ◽

Vol 2 (2) ◽

pp. 83-89

Author(s):

Ahmad Harmain ◽

Paiman Paiman ◽

Henri Kurniawan ◽

Kusrini Kusrini ◽

Dina Maulina

Keyword(s):

Machine Learning ◽

Silhouette Coefficient ◽

Radiative Power ◽

Fire Radiative Power

Kawasan indonesia merupakan bagian dari daerah tropis yang memiliki potensi kebakaran sangat tinggi terlebih pada musim kemarau, sehingga perlunya sebuah langkah kongkrit untuk dilakukan mitigasi supaya potensi-potensi kebakaran hutan itu menjadi terminimalisir. Untuk melakukan itu dibutuhkan suatu metode teknologi yang lebih mumpuni dan terbaru untuk memetakan wilayah-wilayah yang mempunyai potensi besar terjadinya kebakaran hutan. Sistem pencitraan dan Informasi dari sistem satelit (MODIS) adalah salah satu informasi tentang kondisi permukaan bumi, yaitu parameter Latitude, Longitude, Brightness, FRP (Fire Radiative Power), dan Confidence dapat dijadikan dasar pengelompokan suatu wilayah memiliki potensi kebakaran atau tidak. K-Means adalah salah satu metode dalam machine learning yang bisa digunakan sebagai salah satu metode dalam pengelompokan wilayah-wilayah tersebut. Akurasi dalam menguji hasil pengelompokan K-Means dapat diuji dengan metode Davies Bouldin Index (DBI) dan Silhouette Coefficient.

The Employment in Innovative Enterprises in Europe

10.21203/rs.3.rs-1221158/v1 ◽

2022 ◽

Author(s):

Lucio Laureti ◽

Costantiello Alberto ◽

Marco Maria Matarrese ◽

Angelo Leogrande

Keyword(s):

Machine Learning ◽

Panel Data ◽

Fixed Effects ◽

Value Added ◽

Machine Learning Algorithms ◽

High Tech ◽

Dynamic Panel ◽

Silhouette Coefficient ◽

High Tech Product ◽

And Training

Abstract In this article we evaluate the determinants of the Employment in Innovative Enterprises in Europe. We use data from the European Innovation Scoreboard of the European Commission for 36 countries in the period 2000-2019 with Panel Data with Fixed Effects, Panel Data with Random Effects, Dynamic Panel, WLS and Pooled OLS. We found that the “Employment in Innovative Enterprises in Europe” is positively associated with “Broadband Penetration in Europe”, “Foreign Controlled Enterprises Share of Value Added”, “Innovation Index”, “Medium and High-Tech Product Exports” and negatively associated to “Basic School Entrepreneurial Education and Training”, “International Co-Publications”, and “Marketing or Organizational Innovators”. Secondly, we perform a cluster analysis with the k-Means algorithm optimized with the Silhouette Coefficient and we found the presence of four different clusters. Finally, we perform a comparison among eight different machine learning algorithms to predict the level of “Employment in Innovative Enterprises” in Europe and we found that the Linear Regression is the best predictor.

Analisis Peramalan dan Pengelompokan Jumlah Turis ke Jepang

Journal of Integrated System ◽

10.28932/jis.v4i2.3164 ◽

2021 ◽

Vol 4 (2) ◽

pp. 150-167

Author(s):

Laurence - - ◽

Devanny Gumulya ◽

J. Sandra Sembel ◽

Magdalena Lestari Ginting

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Silhouette Coefficient

Pariwisata merupakan salah satu kontributor penting dalam menunjang perekonomian suatu negara. Penelitian ini menitikberatkan pada kajian kunjungan wisatawan asing ke Jepang dengan mengambil data jumlah wisatawan yang berkunjung dan jumlah pengeluaran wisatawan untuk kategori akomodasi, hiburan, makanan dan minuman, belanja, transportasi, dan lain-lain. Pada studi yang dilakukan sebelumnya tidak terdapat pengelompokan negara untuk berbagai macam pengeluaran ini, sehingga posisi penelitian ini adalah mengisi kekosongan tersebut dengan melakukan pengelompokan negara berdasarkan pengeluaran turis. Selain itu, tujuan studi ini juga membuat model peramalan dengan menggunakan metode ARIMA yang mengakomodasi tren dan musim. Data yang terdiri dari enam jenis pengeluaran direduksi menjadi 2 dengan nilai variansi yang dijelaskan sebesar 83,84%. Hasil pengolahan data menunjukkan 2 kelompok negara turis berdasarkan pengeluarannya. Dua grup tersebut terdiri dari 8 negara anggota OECD dan 12 negara non OECD. Turis yang berasal dari negara yang tergabung dalam OECD memberi memainkan peranan penting dalam perekonomian dunia dengan kontribusi sebesar 50,5 % dari total pengeluaran turis dunia. Kualitas gugus dikategorikan baik dengan rata-rata koefisien siluet dan nilai kohesi 0,56. Pengelompokan ini dapat digunakan sebagai dasar untuk melakukan studi perilaku konsumen setiap negara. Metode peramalan menggunakan ARIMA dapat digunakan dengan memasukan elemen tren dan musim ke dalam model. Nilai R2 pada model peramalan menunjukan hasil yang baik pada sebagian besar data turis dari 20 negara. Model ARIMA musiman ini dapat dipertimbangkan sebagai model untuk meramalkan jumlah turis yang datang. Kata kunci: Principal component analysis, k-means clustering, nilai silhouette coefficient and cohesion, ARIMA

A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations

Applied Sciences ◽

10.3390/app12010298 ◽

2021 ◽

Vol 12 (1) ◽

pp. 298

Author(s):

Abeer Elkhouly ◽

Allan Melvin Andrew ◽

Hasliza A. Rahim ◽

Nidhal Abdulaziz ◽

Mohamedfareq Abdulmalek ◽

...

Keyword(s):

Hearing Aids ◽

Spectral Clustering ◽

Pure Tone ◽

Filter Bank ◽

Current Practice ◽

Evaluation Criteria ◽

Silhouette Coefficient ◽

Impaired People ◽

Unsupervised Approach ◽

Standard Set

The current practice of adjusting hearing aids (HA) is tiring and time-consuming for both patients and audiologists. Of hearing-impaired people, 40–50% are not satisfied with their HAs. In addition, good designs of HAs are often avoided since the process of fitting them is exhausting. To improve the fitting process, a machine learning (ML) unsupervised approach is proposed to cluster the pure-tone audiograms (PTA). This work applies the spectral clustering (SP) approach to group audiograms according to their similarity in shape. Different SP approaches are tested for best results and these approaches were evaluated by Silhouette, Calinski-Harabasz, and Davies-Bouldin criteria values. Kutools for Excel add-in is used to generate audiograms’ population, annotated using the results from SP, and different criteria values are used to evaluate population clusters. Finally, these clusters are mapped to a standard set of audiograms used in HA characterization. The results indicated that grouping the data in 8 groups or 10 results in ones with high evaluation criteria. The evaluation for population audiograms clusters shows good performance, as it resulted in a Silhouette coefficient >0.5. This work introduces a new concept to classify audiograms using an ML algorithm according to the audiograms’ similarity in shape.

Penerapan Algoritma K-Medoids dalam Menentukan Daerah Rawan Banjir di Kabupaten Karawang

INFORMAL: Informatics Journal ◽

10.19184/isj.v6i3.25423 ◽

2021 ◽

Vol 6 (3) ◽

pp. 187

Author(s):

Cepy Sukmayadi ◽

Aji Primajaya ◽

Iqbal Maulana

Keyword(s):

Risk Index ◽

Test Results ◽

Clustering Method ◽

Number Of Clusters ◽

A Value ◽

Silhouette Coefficient ◽

Flood Disasters ◽

High Flood ◽

Partition Clustering ◽

Flood Prone Areas

Flood disasters often occur during the rainy season. Karawang is one area that is often flooded. Based on the risk index from BNPB, the flood disaster in Karawang affected 84% of the community, so efforts need to be made to reduce and overcome flood disasters. These problems are the beginning of efforts that need to be known which areas are prone to flooding. Therefore, this study aims to determine flood-prone areas in Karawang as an initial effort in tackling flood disasters. The research was conducted by classifying flood-prone areas using the k-medoids algorithm. K-Medoids uses the partition clustering method to group lists and objects into a number of clusters. This algorithm uses objects in a collection of objects that represent a cluster. The attributes used are flood-causing factors such as rainfall, elevation (soil height), population density, and distance to the river. The results of the study found three potential floods, namely low, medium, and high. There are 1 sub-district with low flood potential, 24 sub-districts with moderate flood potential, and 5 sub-districts with high flood potential. The test results using the silhouette coefficient get a value of 0.370.

Pengelompokan Kabupaten/Kota di Indonesia Berdasarkan Informasi Kemiskinan Tahun 2020 Menggunakan Metode K-Means Clustering Analysis

Seminar Nasional Teknik dan Manajemen Industri ◽

10.28932/sentekmi2021.v1i1.76 ◽

2021 ◽

Vol 1 (1) ◽

pp. 190-199

Author(s):

Rijalul Fikri ◽

Aswin Mushardiyanto ◽

Mochamad Naufal Laudza’Banin ◽

Kristiana Maureen ◽

Harry Patria

Keyword(s):

Principal Component Analysis ◽

Clustering Analysis ◽

Principal Component ◽

Severity Index ◽

Component Analysis ◽

Silhouette Coefficient ◽

Cluster 2

Berdasarkan dataset tentang informasi kemiskinan kabupaten/kota tahun 2020 yang dikeluarkan oleh Badan Pusat Statistik Indonesia, dipilih variabel bebas sebanyak dua puluh variabel yang digunakan dalam penelitian ini. Kemudian dilakukan uji korelasi antar variabel bebas tersebut dan diketahui terdapat variabel yang berkorelasi dikategorikan berkorelasi sangat tinggi, dengan nilai korelasi sebesar 0,921 (Persentase Penduduk Miskin - P1 (Poverty Gap Index)) dan 0,964 (P1 (Poverty Gap Index) - P2 (Proverty Severity Index)). Variabel yang memiliki korelasi sangat tinggi jika digunakan akan menyebabkan terjadinya multikolinearitas, sehingga opsi untuk menghilangkan multikolinearitas adalah dengan menggunakan Principal Component Analysis (PCA). Dengan menggunakan Proporsi Kumulatif Varians dan minimum persentase keragaman data sebesar 80% maka didapatkan output berupa dimensi data baru PCA sebanyak tiga dimensi data atau tiga variabel bebas baru. Dengan menggunakan variabel input baru berupa PCA 0, PCA 1 dan PCA 2 dilakukanlah penentuan jumlah cluster dengan metode Silhouette Coefficient dan analisa clustering menggunakan metode K-Means didapatkanlah empat kelompok/cluster, dengan jumlah anggota cluster 1 sebanyak 117 Kabupaten/Kota, cluster 2 sebanyak 154 Kabupaten/Kota, cluster 3 sebanyak 173 Kabupaten/Kota dan cluster 4 sebanyak 70 Kabupaten/Kota.

Analisa Performa K-Means dan DBSCAN dalam Clustering Minat Penggunaan Transportasi Umum

Elkom : Jurnal Elektronika dan Komputer ◽

10.51903/elkom.v14i2.551 ◽

2021 ◽

Vol 14 (2) ◽

pp. 368-372

Author(s):

Ariel Kristianto

Keyword(s):

Public Transportation ◽

Clustering Algorithms ◽

National Policy ◽

Government Support ◽

Medium Term ◽

Dbscan Clustering ◽

A Value ◽

Silhouette Coefficient ◽

Mode Of Transportation ◽

The Government

Public transportation is one of the important modes of transportation and is the backbone of transportation in Indonesia. The development of public transportation is also supported by the government, this government support is evident in the national policy, namely the National Medium Term Development Plan (RPJMN). Although public transportation is an effective mode of transportation, it also has obstacles in its development, namely how to meet customer desires in choosing a mode of transportation. There are several variables that are the focus of this research, namely age, gender, income, cost, speed, comfort, safety, efficiency and flexibility. The search for influential variables will use the K-Means and DBSCAN clustering algorithms, these two algorithms are also compared to their performance to find a better algorithm. The results of the Silhouette Coefficient show that DBSCAN has a better performance with a value of 0.99 than K-Means with a value of 0.86. The variables that affect the interest in using public transportation are the most important ones related to cost, speed, comfort, safety, efficiency and flexibility.

Customer Segmentation berdasarkan Usia, Jumlah Kredit dan Lama Kredit Nasabah di Bank XYZ menggunakan Model K-Means Clustering

Prosiding Seminar Nasional Universitas Ma Chung ◽

10.33479/snumc.v1i.228 ◽

2021 ◽

Vol 1 ◽

pp. 101-116

Author(s):

Moch Rizky Wijaya ◽

Gigih Satriyo Wibowo

Keyword(s):

Data Mining ◽

Internet Banking ◽

Customer Segmentation ◽

Silhouette Coefficient ◽

Cluster 2

Pelanggan internet banking tumbuh sangat cepat. Segmentasi nasabah dapat diterapkan berdasarkan data internet banking. Clustering adalah teknik data mining tanpa pengawasan yang dapat digunakan untuk segmentasi pelanggan. Penelitian ini membangun model clustering pada data profil nasabah bank berdasarkan data kredit sehinagga didapatkan segemntasi nasabah yang nantinya digunakan sebagai landasan keputusan untuk melakukan startegi pemasaran. Metode clustering menggunakan metode K-Means dengan validasi cluster menggunakan metode Silhouette coefficient. Berdasarkan Silhouette coefficient didapatkan nilai terbaik untuk 3 cluster yaitu cluster 0, 1, dan 2. Hasil cluster dengan k-means terbagi menjadi 3 cluster yaitu cluster 0 – rata-rata jumlah kredit lebih rendah, durasi pendek, dan pelanggan usia tua, cluster 1 – rata-rata jumlah kredit tinggi, durasi panjang, dan pelanggan paruh baya atau usai pertengahan, dan cluster 2 - rata-rata jumlah kredit lebih rendah, durasi pendek, dan pelanggan usia muda. Hasil segementasi tersebut dapat dijadikan acuan untuk melakukan startegi pemasaran kedepan.

silhouette coefficient
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

PENERAPAN TEXT MINING UNTUK MELAKUKAN CLUSTERING DATA TWEET AKUN BLIBLI PADA MEDIA SOSIAL TWITTER MENGGUNAKAN K-MEANS CLUSTERING

ANALISIS KECENDERUNGAN LAPORAN MASYARAKAT PADA “LAPORGUB..!” PROVINSI JAWA TENGAH MENGGUNAKAN TEXT MINING DENGAN FUZZY C-MEANS CLUSTERING

NORMALISASI DATA UNTUK EFISIENSI K-MEANS PADA PENGELOMPOKAN WILAYAH BERPOTENSI KEBAKARAN HUTAN DAN LAHAN BERDASARKAN SEBARAN TITIK PANAS

The Employment in Innovative Enterprises in Europe

Analisis Peramalan dan Pengelompokan Jumlah Turis ke Jepang

A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations

Penerapan Algoritma K-Medoids dalam Menentukan Daerah Rawan Banjir di Kabupaten Karawang

Pengelompokan Kabupaten/Kota di Indonesia Berdasarkan Informasi Kemiskinan Tahun 2020 Menggunakan Metode K-Means Clustering Analysis

Analisa Performa K-Means dan DBSCAN dalam Clustering Minat Penggunaan Transportasi Umum

Customer Segmentation berdasarkan Usia, Jumlah Kredit dan Lama Kredit Nasabah di Bank XYZ menggunakan Model K-Means Clustering

Export Citation Format

silhouette coefficientRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

PENERAPAN TEXT MINING UNTUK MELAKUKAN CLUSTERING DATA TWEET AKUN BLIBLI PADA MEDIA SOSIAL TWITTER MENGGUNAKAN K-MEANS CLUSTERING

ANALISIS KECENDERUNGAN LAPORAN MASYARAKAT PADA “LAPORGUB..!” PROVINSI JAWA TENGAH MENGGUNAKAN TEXT MINING DENGAN FUZZY C-MEANS CLUSTERING

NORMALISASI DATA UNTUK EFISIENSI K-MEANS PADA PENGELOMPOKAN WILAYAH BERPOTENSI KEBAKARAN HUTAN DAN LAHAN BERDASARKAN SEBARAN TITIK PANAS

The Employment in Innovative Enterprises in Europe

Analisis Peramalan dan Pengelompokan Jumlah Turis ke Jepang

A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations

Penerapan Algoritma K-Medoids dalam Menentukan Daerah Rawan Banjir di Kabupaten Karawang

Pengelompokan Kabupaten/Kota di Indonesia Berdasarkan Informasi Kemiskinan Tahun 2020 Menggunakan Metode K-Means Clustering Analysis

Analisa Performa K-Means dan DBSCAN dalam Clustering Minat Penggunaan Transportasi Umum

Customer Segmentation berdasarkan Usia, Jumlah Kredit dan Lama Kredit Nasabah di Bank XYZ menggunakan Model K-Means Clustering

silhouette coefficient
Recently Published Documents