scholarly journals Peringkasan Tweet Berdasarkan Trending Topic Twitter Dengan Pembobotan TF-IDF dan Single Linkage AngglomerativeHierarchical Clustering

Author(s):  
Annisa Annisa ◽  
Yuda Munarko ◽  
Yufis Azhar

Trending topic is a feature provided by twitter that informs something widely discussed by users in a particular time. The form of a trending topic is a hashtag and can be selected by clicking. However, the number of tweets for each trending topics can be very large, so it will be difficult if we want to know all the contents. So, in order to make easy when reading the topic, a small number of tweets can be selected as the main idea of the topic. In this study, we applied the Agglomerative Single Linkage Hierarchical Clustering by calculating the TF-IDF value for each word in advance. We used 100 trending topics, where each topic consists of 50 tweets in Indonesian. For testing, we provided 30 trending topics which consist of 2 until 9 sub-topics. The result is that each trending topics can be summarized into shorter text contains 2 until 9 tweets. We were able to summarize 1 trending topics exactly same as the topic summarized by human expert. However, the rest of topics corresponded partially with human expert. 

2017 ◽  
Vol 470 (1) ◽  
pp. 576-591 ◽  
Author(s):  
Viktor Radović ◽  
Bojan Novaković ◽  
Valerio Carruba ◽  
Dušan Marčeta

Abstract Asteroid families are a valuable source of information to many asteroid-related researches, assuming a reliable list of their members could be obtained. However, as the number of known asteroids increases fast it becomes more and more difficult to obtain a robust list of members of an asteroid family. Here, we are proposing a new approach to deal with the problem, based on the well-known hierarchical clustering method. An additional step in the whole procedure is introduced in order to reduce a so-called chaining effect. The main idea is to prevent chaining through an already identified interloper. We show that in this way a number of potential interlopers among family members is significantly reduced. Moreover, we developed an automatic online-based portal to apply this procedure, i.e. to generate a list of family members as well as a list of potential interlopers. The Asteroid Families Portal is freely available to all interested researchers.


2017 ◽  
Vol 14 (1) ◽  
Author(s):  
Zdeněk Šulc ◽  
Martin Matějka ◽  
Jiří Procházka ◽  
Hana Řezanková

This paper thoroughly examines three recently introduced modifications of the Gower coefficient, which were determined for data with mixed-type variables in hierarchical clustering. On the contrary to the original Gower coefficient, which only recognizes if two categories match or not in the case of nominal variables, the examined modifications offer three different approaches to measuring the similarity between categories. The examined dissimilarity measures are compared and evaluated regarding the quality of their clusters measured by three internal indices (Dunn, silhouette, McClain) and regarding their classification abilities measured by the Rand index. The comparison is performed on 810 generated datasets. In the analysis, the performance of the similarity measures is evaluated by different data characteristics (the number of variables, the number of categories, the distance of clusters, etc.) and by different hierarchical clustering methods (average, complete, McQuitty and single linkage methods). As a result, two modifications are recommended for the use in practice.


2017 ◽  
Vol 185 ◽  
pp. 15-28 ◽  
Author(s):  
Dekang Zhu ◽  
Dan P. Guralnik ◽  
Xuezhi Wang ◽  
Xiang Li ◽  
Bill Moran

2021 ◽  
Vol 15 (2) ◽  
pp. 63
Author(s):  
Desy Exasanti ◽  
Arief Jananto

Abstrak−Klasterisasi merupakan metode pengelompokan dari data yang sudah diketahui label kelasnya untuk menemukan klaster baru dari hasil observasi. Dalam klasterisasi banyak metode yaitu metode terpusat, hirarki, kepadatan dan berbasis kisi, namun dalam penelitian yang dilakukan ini dipilih metode berbasis hirarki. Metode hirarki ini bekerja melakukan pengelompokan objek dengan membentuk hirarki klaster namun bukan berarti selalu digambarkan dengan hirarki dalam organsasi. Dipilihnya Agglomerative Hierarchical Clustering dimana merupakan jenis dari bawah ke atas atau biasa disebut (bottom-up) dalam metode ini objek yang akan diuji dianggap sebagai objek tunggal sebagai klaster dan lalu dilakukan iterasi untuk menemukan klaster-klaster yang lebih besar. Data yang akan digunakan adalah data non-kebakaran pada Dinas Pemadam Kebakaran Kota Semarang ynng mana akan dilakukan pengelompokan wilayah penanganan non-kebakaran. Dinas Pemadam Kebakaran melakukan penanganan bukan hanya kebakaran saja namun ada banyak hal yang sebenarnya dapat ditangani oleh petugas pemadam kebakaran, kejadian non-kebakaran ada beberapa seperti evakuasi reptil, evakuasi kucing, penyelamatan korban kecelakaan dan lain sebagainya. Dari data non-kebakaran dari 16 kecamatan di Kota Semarang pada tahun 2019 akan dilakukan uji menggunakan tiga algoritma yaitu Single Lingkage, Average Linkage dan Complete Linkage . Adapun dari algoritma Single Linkage dilakukan prosedur pemusatan dari jarak terkecil antar objek data, algoritma Average Linkage dilakukan prosedur dari jarak rata-rata objek data, sedangkan jika algoritma Complete Linkage dilakukan prosedur pemusatan dari jarak yang terbesar. Implementasi dan visualiasi dari data uji coba yang dilakukan di penilitian ini menggunakan tools WEKA 3.8.4, Wakaito Environment Analysis for Knowledge atau yang biasa dikenal dengan WEKA ini merupakan software yang menggunakan bahasa pemrograman java. Dari dataset 380 data diambil sampel 100 data untuk diuji mengunakan WEKA menggunakan metode perhtungan jarak Manhattan Distance dengan 3 cluster. Hasil dari data uji coba dapat divisualisasikan dengan visualisasi dendogram pada fitur visualize tree  dan jika dilakukan visualisasi dalam bentuk grafik dapat dilakukan menggunakan fitur visualize clusters assignment.


Author(s):  
Satoshi Takumi ◽  
◽  
Sadaaki Miyamoto

The aim of this paper is to study methods of twofold membership clustering using the nearest prototype and nearest neighbor. The former uses theK-means, whereas the latter extends the single linkage in agglomerative hierarchical clustering. The concept of inductive clustering is moreover used for the both methods, which means that natural classification rules are derived as the results of clustering, a typical example of which is the Voronoi regions inK-means clustering. When the rule of nearest prototype allocation inK-means is replaced by nearest neighbor classification, we have inductive clustering related to the single linkage in agglomerative hierarchical clustering. The former method usesK-means or fuzzyc-means with noise clusters, whereby twofold memberships are derived; the latter method also derives two memberships in a different manner. Theoretical properties of the both methods are studied. Illustrative examples show implications and significances of this concept.


2016 ◽  
Vol 33 (1) ◽  
pp. 118-140 ◽  
Author(s):  
Alvaro Martínez-Pérez

Sign in / Sign up

Export Citation Format

Share Document