Agglomerative Hierarchical Clustering Without Reversals on Dendrograms Using Asymmetric Similarity Measures

Author(s):  
Satoshi Takumi ◽  
◽  
Sadaaki Miyamoto

Algorithms of agglomerative hierarchical clustering using asymmetric similarity measures are studied. Two different measures between two clusters are proposed, one of which generalizes the average linkage for symmetric similarity measures. Asymmetric dendrogram representation is considered after foregoing studies. It is proved that the proposed linkage methods for asymmetric measures have no reversals in the dendrograms. Examples based on real data show how the methods work.

2021 ◽  
Vol 6 (1) ◽  
pp. 60-69
Author(s):  
Syabdan Dalimunthe ◽  
Anggi Hanafiah

Health is something very precious. Maintaining health can be done in many ways, one of them by keeping your diet. The correct diet will keep your immune system so that it can avoid various diseases. The proper diet will also put the body in a balanced nutrition state, which all need to be nourished. Nutrient requirements include calories, protein, fat, carbohydrates, calcium, phosphorus, iron, vitamin A, vitamin B, and vitamin C with a mass of 100 grams each. To facilitate the search for nutrients needed, then build a system that can categorize food based on its nutritional status and calculate the average value of nutrients in agglomerative hierarchical clustering using average linkage. Calculation of intermediate linkage methods produces data that has some similarities to the data sought nutrients that can be seen from its index, so precise data are in each group.


2017 ◽  
Vol 14 (1) ◽  
Author(s):  
Zdeněk Šulc ◽  
Martin Matějka ◽  
Jiří Procházka ◽  
Hana Řezanková

This paper thoroughly examines three recently introduced modifications of the Gower coefficient, which were determined for data with mixed-type variables in hierarchical clustering. On the contrary to the original Gower coefficient, which only recognizes if two categories match or not in the case of nominal variables, the examined modifications offer three different approaches to measuring the similarity between categories. The examined dissimilarity measures are compared and evaluated regarding the quality of their clusters measured by three internal indices (Dunn, silhouette, McClain) and regarding their classification abilities measured by the Rand index. The comparison is performed on 810 generated datasets. In the analysis, the performance of the similarity measures is evaluated by different data characteristics (the number of variables, the number of categories, the distance of clusters, etc.) and by different hierarchical clustering methods (average, complete, McQuitty and single linkage methods). As a result, two modifications are recommended for the use in practice.


2021 ◽  
Vol 15 (2) ◽  
pp. 63
Author(s):  
Desy Exasanti ◽  
Arief Jananto

Abstrak−Klasterisasi merupakan metode pengelompokan dari data yang sudah diketahui label kelasnya untuk menemukan klaster baru dari hasil observasi. Dalam klasterisasi banyak metode yaitu metode terpusat, hirarki, kepadatan dan berbasis kisi, namun dalam penelitian yang dilakukan ini dipilih metode berbasis hirarki. Metode hirarki ini bekerja melakukan pengelompokan objek dengan membentuk hirarki klaster namun bukan berarti selalu digambarkan dengan hirarki dalam organsasi. Dipilihnya Agglomerative Hierarchical Clustering dimana merupakan jenis dari bawah ke atas atau biasa disebut (bottom-up) dalam metode ini objek yang akan diuji dianggap sebagai objek tunggal sebagai klaster dan lalu dilakukan iterasi untuk menemukan klaster-klaster yang lebih besar. Data yang akan digunakan adalah data non-kebakaran pada Dinas Pemadam Kebakaran Kota Semarang ynng mana akan dilakukan pengelompokan wilayah penanganan non-kebakaran. Dinas Pemadam Kebakaran melakukan penanganan bukan hanya kebakaran saja namun ada banyak hal yang sebenarnya dapat ditangani oleh petugas pemadam kebakaran, kejadian non-kebakaran ada beberapa seperti evakuasi reptil, evakuasi kucing, penyelamatan korban kecelakaan dan lain sebagainya. Dari data non-kebakaran dari 16 kecamatan di Kota Semarang pada tahun 2019 akan dilakukan uji menggunakan tiga algoritma yaitu Single Lingkage, Average Linkage dan Complete Linkage . Adapun dari algoritma Single Linkage dilakukan prosedur pemusatan dari jarak terkecil antar objek data, algoritma Average Linkage dilakukan prosedur dari jarak rata-rata objek data, sedangkan jika algoritma Complete Linkage dilakukan prosedur pemusatan dari jarak yang terbesar. Implementasi dan visualiasi dari data uji coba yang dilakukan di penilitian ini menggunakan tools WEKA 3.8.4, Wakaito Environment Analysis for Knowledge atau yang biasa dikenal dengan WEKA ini merupakan software yang menggunakan bahasa pemrograman java. Dari dataset 380 data diambil sampel 100 data untuk diuji mengunakan WEKA menggunakan metode perhtungan jarak Manhattan Distance dengan 3 cluster. Hasil dari data uji coba dapat divisualisasikan dengan visualisasi dendogram pada fitur visualize tree  dan jika dilakukan visualisasi dalam bentuk grafik dapat dilakukan menggunakan fitur visualize clusters assignment.


CAUCHY ◽  
2015 ◽  
Vol 4 (1) ◽  
pp. 25
Author(s):  
Alfi Fadliana ◽  
Fachrur Rozi

Agglomerative hierarchical clustering methods is cluster analysis method whose primary purpose is to group objects based on its characteristics, it begins with the individual objects until the objects are fused into a single cluster. Agglomerative hierarchical clustering methods are divided into single linkage, complete linkage, average linkage, and ward. This research compared the four agglomerative hierarchical clustering methods in order to get the best cluster solution in the case of the classification of regencies/cities in East Java province based on the quality of “Keluarga Berencana” (KB) services. The results of this research showed that based on calculation of cophenetic correlation coefficient, the best cluster solution is produced by average linkage method. This method obtained four clusters with the different characteristics. Cluster 1 has an “extremely bad condition” on the qualification of KB clinics and the competence of KB service personnel. Cluster 2 has a “good condition” on the qualification of KB clinics and “bad condition” on the competence of KB service personnel. Cluster 3 has a “bad condition” on the qualification of KB clinics and “medium condition” on the competence of KB service personnel. Cluster 4 have a “medium condition” on the qualification of KB clinics and a “good condition” on the competence of KB service personnel


2021 ◽  
Vol 18 (1) ◽  
pp. 130-140
Author(s):  
Yanuwar Reinaldi ◽  
Nurissaidah Ulinnuha ◽  
Moh. Hafiyusholeh

Community welfare is one of the important points for a region and is also the essence of national development. The welfare of the people in Indonesia is fairly unequal, especially in East Java. To be able to map an area to the welfare of its people in East Java, one way that can be used is to use clustering. The hierarchical clustering method is one of the clustering methods for grouping data. In hierarchical clustering, single linkage, complete linkage, and average linkage methods are suitable methods for grouping data, which will compare the best method to use. The results of the calculation show that the average linkage method with three clusters is the best calculation with a silhouette index value of 0.6054, with the 1st cluster there are 23 regions, namely the city/district with the highest community welfare, the 2nd cluster there are 11 regions, namely cities/districts with moderate social welfare, and in the third cluster there are 4 regions, namely cities/districts with the lowest community welfare.


2019 ◽  
Vol 14 (2) ◽  
pp. 148-156
Author(s):  
Nighat Noureen ◽  
Sahar Fazal ◽  
Muhammad Abdul Qadir ◽  
Muhammad Tanvir Afzal

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


Sign in / Sign up

Export Citation Format

Share Document