scholarly journals Unsupervised Twitter Sentiment Analysis on The Revision of Indonesian Code Law and the Anti-Corruption Law using Combination Method of Opinion Word and Agglomerative Hierarchical Clustering

2020 ◽  
Vol 8 (1) ◽  
pp. 200-220
Nur Restu Prayoga ◽  
Tresna Maulana Fahrudin ◽  
Made Kamisutara ◽  
Angga Rahagiyanto ◽  
Tahegga Primananda Alfath ◽  

The rejection on ratification of the revision of Indonesian Code Law or known as RKUHP and Corruption Law raises several opinions from various perspectives in social media. Twitter as one of many platforms affected, has more than 19.5 million users in Indonesia. Twitter is one of many social media in Indonesia where people can share their views, arguments, information, and opinions from all points of view. Since Twitter has a great diversity of users, it needs a system which is designed to determine the opinion tendency towards the problems or objects. The purpose of this study is to analyze the sentiment of Twitter users' tweets to reject the revision of the Law whether they have positive or negative sentiments using the Agglomerative Hierarchical Clustering method. The data that being used in this study were obtained from the results of crawling tweets based on hashtag (#) (#ReformasiDikorupsi). The next stage is pre-processing which consists of case folding, tokenizing, cleansing, sanitizing, and stemming. The extraction features Opinion words and Term Frequency (TF) which performs the process automatically. In the clustering stage, two clusters use three approaches; single linkage, complete linkage and average linkage. In the accuracy calculation phase, the writer uses the error ratio, confusion matrix, and silhouette coefficient. Therefore, the results are quite good. From 2408 tweets, the highest accuracy results are 61.6%.

2022 ◽  
Vol 10 (4) ◽  
pp. 583-593
Syiva Multi Fani ◽  
Rukun Santoso ◽  
Suparti Suparti

Social media is computer-based technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. Twitter is one of the most popular social media in Indonesia which has 78 million users. Businesses rely heavily on Twitter for advertising. Businesses can use these types of tweet content as a means of advertising to Twitter users by Knowing the types of tweet content that are mostly retweeted by their followers . In this study, the application of Text Mining to perform clustering using the K-means clustering method with the best number of clusters obtained from the Silhouette Coefficient method on the @bliblidotcom Twitter tweet data to determine the types of tweet content that are mostly retweeted by @bliblidotcom followers. Tweets with the most retweets and favorites are discount offers and flash sales, so Blibli Indonesia could use this kind of tweet to conduct advertising on social media Twitter because the prize quiz tweets are liked by the @bliblidotcom Twitter account followers.

2021 ◽  
Vol 15 (2) ◽  
pp. 63
Desy Exasanti ◽  
Arief Jananto

Abstrak−Klasterisasi merupakan metode pengelompokan dari data yang sudah diketahui label kelasnya untuk menemukan klaster baru dari hasil observasi. Dalam klasterisasi banyak metode yaitu metode terpusat, hirarki, kepadatan dan berbasis kisi, namun dalam penelitian yang dilakukan ini dipilih metode berbasis hirarki. Metode hirarki ini bekerja melakukan pengelompokan objek dengan membentuk hirarki klaster namun bukan berarti selalu digambarkan dengan hirarki dalam organsasi. Dipilihnya Agglomerative Hierarchical Clustering dimana merupakan jenis dari bawah ke atas atau biasa disebut (bottom-up) dalam metode ini objek yang akan diuji dianggap sebagai objek tunggal sebagai klaster dan lalu dilakukan iterasi untuk menemukan klaster-klaster yang lebih besar. Data yang akan digunakan adalah data non-kebakaran pada Dinas Pemadam Kebakaran Kota Semarang ynng mana akan dilakukan pengelompokan wilayah penanganan non-kebakaran. Dinas Pemadam Kebakaran melakukan penanganan bukan hanya kebakaran saja namun ada banyak hal yang sebenarnya dapat ditangani oleh petugas pemadam kebakaran, kejadian non-kebakaran ada beberapa seperti evakuasi reptil, evakuasi kucing, penyelamatan korban kecelakaan dan lain sebagainya. Dari data non-kebakaran dari 16 kecamatan di Kota Semarang pada tahun 2019 akan dilakukan uji menggunakan tiga algoritma yaitu Single Lingkage, Average Linkage dan Complete Linkage . Adapun dari algoritma Single Linkage dilakukan prosedur pemusatan dari jarak terkecil antar objek data, algoritma Average Linkage dilakukan prosedur dari jarak rata-rata objek data, sedangkan jika algoritma Complete Linkage dilakukan prosedur pemusatan dari jarak yang terbesar. Implementasi dan visualiasi dari data uji coba yang dilakukan di penilitian ini menggunakan tools WEKA 3.8.4, Wakaito Environment Analysis for Knowledge atau yang biasa dikenal dengan WEKA ini merupakan software yang menggunakan bahasa pemrograman java. Dari dataset 380 data diambil sampel 100 data untuk diuji mengunakan WEKA menggunakan metode perhtungan jarak Manhattan Distance dengan 3 cluster. Hasil dari data uji coba dapat divisualisasikan dengan visualisasi dendogram pada fitur visualize tree  dan jika dilakukan visualisasi dalam bentuk grafik dapat dilakukan menggunakan fitur visualize clusters assignment.

Satoshi Takumi ◽  
Sadaaki Miyamoto

The aim of this paper is to study methods of twofold membership clustering using the nearest prototype and nearest neighbor. The former uses theK-means, whereas the latter extends the single linkage in agglomerative hierarchical clustering. The concept of inductive clustering is moreover used for the both methods, which means that natural classification rules are derived as the results of clustering, a typical example of which is the Voronoi regions inK-means clustering. When the rule of nearest prototype allocation inK-means is replaced by nearest neighbor classification, we have inductive clustering related to the single linkage in agglomerative hierarchical clustering. The former method usesK-means or fuzzyc-means with noise clusters, whereby twofold memberships are derived; the latter method also derives two memberships in a different manner. Theoretical properties of the both methods are studied. Illustrative examples show implications and significances of this concept.

2020 ◽  
Vol 18 (1) ◽  
pp. 75
Widyawati Widyawati ◽  
Wawan Laksito Yuly Saptomo ◽  
Yustina Retno Wahyu Utami

As more businesses emerge, companies need to have the right marketing strategy to provide the best service to customers. The first step is to know the type of customer and make appropriate marketing strategies according to the type of customer. In this research, it is proposed for clustering customers so that an appropriate strategy for that customer group can be determined. The method used for cluster formation uses Agglomerative Hierarchical Clustering with Average Linkage approach and distance determination using Manhattan Distance. The variables in this research are Recency, Frequency, and Monetary (RFM). The results of testing using the Silhouette coefficient show that the results of 7 clusters are the best results when compared with 2 clusters up to 20 clusters because they have the smallest minus value. Based on the results of the Silhoutte coefficient, customer segmentation uses 7 clusters with each cluster representing the existing customer type.


Fakultas Teknologi Industri Universitas Ahmad Dahlan setiap tahun menerima mahasiswa baru dalam jumlah besar. Disisi lain, kelulusan mahasiswa yang lulus tepat waktu masih rendah sehingga rasio dosen dan mahasiswa semakin besar. Akibat lainnya adalah pengguna fasilitas kampus melebihi kapasitas dan kegiatan belajar mengajar menjadi tidak efektif, sehingga diperlukan tahapan berupa pengelompokan data mahasiswa berdasarkan data akademik sebelum kuliah dan data kelulusan mahasiswa yang dilakukan dengan teknik data mining untuk mengetahui kelompok-kelompok mahasiswa yang lulus tepat waktu di Fakultas Teknologi Industri.                 Penelitian ini menggunakan metode pengelompokan hierarki (Hierarchical Clustering). Tahapan dalam penelitian ini dari Load Data, Cleaning Data, Transformation Data dengan metode One Hot Encoding, Euclidean Distance, dan pengelompokan Agglomerative Hierarchical Clustering. Pengujian hasil cluster menggunakan Silhouette Coefficient, serta dilakukan evaluasi pola, dan representasi pengetahuan.                 Penelitian menghasilkan 158 mahasiswa yang direkomendasikan dan semuanya berasal dari Pulau Jawa dan rata-rata nilai matematika >= 80 pada dataset Informatika, Industri, dan Elektro, dan >=67 untuk Kimia. Diperoleh data yang direkomendasikan dengan jumlah data berturut-turut 43, 24, 19, dan 72 data. Pengujian Silhouette Coefficient memperoleh hasil yang sangat bagus dengan nilai berturut-turut sesuai prodi sebesar 0,868, 0,883, 0,879, dan 0,873.

2017 ◽  
Vol 6 (2) ◽  
pp. 49-56
Dian Dharmayanti ◽  
Adam Mukharil Bachtiar ◽  
Andre Catur Prasetyo

Sebagai sekolah yang menjadi pilihan favorit, SMPN 19 Bandung harus menjaga kualitas pendidikannya. Siswa kelas 9 biasanya diwajibkan mengikuti pemantapan atau try out. Selain pemantapan, seharusnya sekolah melakukan pembentukan kelompok belajar. Permasalahannya adalah pihak sekolah biasanya membagi kelompok hanya berdasarkan urutan absensi saja. Sehingga akan mengakibatkan siswa yang unggul berada dalam satu kelompok dengan siswa yang tertinggal dalam suatu mata pelajaran dan dikhawatirkan siswa unggul tersebut akan merasa bosan karena materi yang diberikan sudah dipahaminya diulang-ulang agar siswa yang tertinggal dapat mengejar ketertinggalannya. Dalam data mining, terdapat metode yang dapat digunakan untuk membagi data ke dalam beberapa kelompok berdasarkan kemiripan datanya, yaitu metode clustering[5]. Dalam Clustering pun terdapat beberapa metode yang dapat digunakan, salah satunya adalah Agglomerative Hierarchical Clustering (AHC) dengan menggunakan algoritma single linkage[7]. Proses AHC dengan menggunakan algoritma single linkage dimulai dengan menentukan jumlah kelompok yang akan dibentuk, enganggap seluruh data sebagai cluster, menghitung matriks jarak, mencari dua cluster terdekat lalu menggabungkannya, kemudian ulangi langkah ke-3 hingga tersisa sejumlah cluster yang ingin dibentuk[8]. Berdasarkan hasil pengujian dapat disimpulkan bahwa aplikasi Pembentuk Kelompok Belajar ini sudah membantu pihak kurikulum dalam membentuk kelompok belajar yang sesuai berdasarkan kemiripan nilai siswanya pada masingmasing kelompok.

Pratama Ryan Harnanda ◽  
Natalia Damastuti ◽  
Tresna Maulana Fahrudin

The blood needs of PMI (Indonesian Red Cross) in the Surabaya City area are sometimes erratic, the problem occurs because the amount of blood demand continues to increase while the blood supply is running low. As the main objective of this research, data mining was applied to able to cluster the blood donor data in UTD-PMI Surabaya City Center which was to determine both potential and no potential donors and also visualize the pattern of donor distribution in Geographic Information System (GIS). Agglomerative Hierarchical Clustering was applied to obtain the clustering result from the existing of 8757 donors. The experiment result shown that the cluster quality was quite good which reached 0.6065410 using Silhouette Coefficient. We concluded the one interesting analysis that private male employees with blood type O, and live in the eastern part of Surabaya City are the most potential donors.

2020 ◽  
Aleksandra Urman ◽  
Stefania Ionescu ◽  
David Garcia ◽  
Anikó Hannák

BACKGROUND Since the beginning of the COVID-19 pandemic, scientists have been willing to share their results quickly to speed up the development of potential treatments and/or a vaccine. At the same time, traditional peer-review-based publication systems are not always able to process new research promptly. This has contributed to a surge in the number of medical preprints published since January 2020. In the absence of a vaccine, preventative measures such as social distancing are most helpful in slowing the spread of COVID-19. Their effectiveness can be undermined if the public does not comply with them. Hence, public discourse can have a direct effect on the progression of the pandemic. Research shows that social media discussions on COVID-19 are driven mainly by the findings from preprints, not peer-reviewed papers, highlighting the need to examine the ways medical preprints are shared and discussed online. OBJECTIVE We examine the patterns of medRxiv preprint sharing on Twitter to establish (1) whether the number of tweets linking to medRxiv increased with the advent of the COVID-19 pandemic; (2) which medical preprints were mentioned on Twitter most often; (3) whether medRxiv sharing patterns on Twitter exhibit political partisanship; (4) whether the discourse surrounding medical preprints among Twitter users has changed throughout the pandemic. METHODS The analysis is based on tweets (n=557,405) containing links to medRxriv preprint repository that were posted between the creation of the repository in June 2019 and June 2020. The study relies on a combination of statistical techniques and text analysis methods. RESULTS Since January 2020, the number of tweets linking to medRxiv has increased drastically, peaking in April 2020 with a subsequent cool-down. Before the pandemic, preprints were shared predominantly by users we identify as medical professionals and scientists. After January 2020, other users, including politically-engaged ones, have started increasingly tweeting about medRxiv. Our findings indicate a political divide in sharing patterns of the top-10 most-tweeted preprints. All of them were shared more frequently by users who describe themselves as Republicans than by users who describe themselves as Democrats. Finally, we observe a change in the discourse around medRxiv preprints. Pre-pandemic tweets linking to them were predominantly using the word “preprint”. In February 2020 “preprint” was taken over by the word “study”. Our analysis suggests this change is at least partially driven by politically-engaged users. Widely shared medical preprints can have a direct effect on the public discourse around COVID-19, which in turn can affect the societies’ willingness to comply with preventative measures. This calls for an increased responsibility when dealing with medical preprints from all parties involved: scientists, preprint repositories, media, politicians, and social media companies. CONCLUSIONS Widely shared medical preprints can have a direct effect on the public discourse around COVID-19, which in turn can affect the societies’ willingness to comply with preventative measures. This calls for an increased responsibility when dealing with medical preprints from all parties involved: scientists, preprint repositories, media, politicians, and social media companies.

2020 ◽  
Ethan Kaji ◽  
Maggie Bushman

BACKGROUND Adolescents with depression often turn to social media to express their feelings, for support, and for educational purposes. Little is known about how Reddit, a forum-based platform, compares to Twitter, a newsfeed platform, when it comes to content surrounding depression. OBJECTIVE The purpose of this study is to identify differences between Reddit and Twitter concerning how depression is discussed and represented online. METHODS A content analysis of Reddit posts and Twitter posts, using r/depression and #depression, identified signs of depression using the DSM-IV criteria. Other youth-related topics, including School, Family, and Social Activity, and the presence of medical or promotional content were also coded for. Relative frequency of each code was then compared between platforms as well as the average DSM-IV score for each platform. RESULTS A total of 102 posts were included in this study, with 53 Reddit posts and 49 Twitter posts. Findings suggest that Reddit has more content with signs of depression with 92% than Twitter with 24%. 28.3% of Reddit posts included medical content compared to Twitter with 18.4%. 53.1% of Twitter posts had promotional content while Reddit posts didn’t contain promotional content. CONCLUSIONS Users with depression seem more willing to discuss their mental health on the subreddit r/depression than on Twitter. Twitter users also use #depression with a wider variety of topics, not all of which actually involve a case of depression.

2021 ◽  
pp. 004728162110078
Shanna Cameron ◽  
Alexandra Russell ◽  
Luke Brake ◽  
Katherine Fredlund ◽  
Angela Morris

This article engages with recent discussions in the field of technical communication that call for climate change research that moves beyond the believer/denier dichotomy. For this study, our research team coded 900 tweets about climate change and global warming for different emotions in order to understand how Twitter users rely on affect rhetorically. Our findings use quantitative content analysis to challenge current assumptions about writing and affect on social media, and our results indicate a number of arenas for future research on affect, global warming, and rhetoric.

Sign in / Sign up

Export Citation Format

Share Document