Statistical properties of the single linkage hierarchical clustering estimator

This paper thoroughly examines three recently introduced modifications of the Gower coefficient, which were determined for data with mixed-type variables in hierarchical clustering. On the contrary to the original Gower coefficient, which only recognizes if two categories match or not in the case of nominal variables, the examined modifications offer three different approaches to measuring the similarity between categories. The examined dissimilarity measures are compared and evaluated regarding the quality of their clusters measured by three internal indices (Dunn, silhouette, McClain) and regarding their classification abilities measured by the Rand index. The comparison is performed on 810 generated datasets. In the analysis, the performance of the similarity measures is evaluated by different data characteristics (the number of variables, the number of categories, the distance of clusters, etc.) and by different hierarchical clustering methods (average, complete, McQuitty and single linkage methods). As a result, two modifications are recommended for the use in practice.

Download Full-text

Analisa Hasil Pengelompokan Wilayah Kejadian Non-Kebakaran Menggunakan Agglomerative Hierachical Clustering di Semarang

Jurnal Tekno Kompak ◽

10.33365/jtk.v15i2.1166 ◽

2021 ◽

Vol 15 (2) ◽

pp. 63

Author(s):

Desy Exasanti ◽

Arief Jananto

Keyword(s):

Hierarchical Clustering ◽

Manhattan Distance ◽

Agglomerative Hierarchical Clustering ◽

Single Linkage ◽

Bottom Up ◽

Environment Analysis ◽

Complete Linkage ◽

Average Linkage

Abstrak−Klasterisasi merupakan metode pengelompokan dari data yang sudah diketahui label kelasnya untuk menemukan klaster baru dari hasil observasi. Dalam klasterisasi banyak metode yaitu metode terpusat, hirarki, kepadatan dan berbasis kisi, namun dalam penelitian yang dilakukan ini dipilih metode berbasis hirarki. Metode hirarki ini bekerja melakukan pengelompokan objek dengan membentuk hirarki klaster namun bukan berarti selalu digambarkan dengan hirarki dalam organsasi. Dipilihnya Agglomerative Hierarchical Clustering dimana merupakan jenis dari bawah ke atas atau biasa disebut (bottom-up) dalam metode ini objek yang akan diuji dianggap sebagai objek tunggal sebagai klaster dan lalu dilakukan iterasi untuk menemukan klaster-klaster yang lebih besar. Data yang akan digunakan adalah data non-kebakaran pada Dinas Pemadam Kebakaran Kota Semarang ynng mana akan dilakukan pengelompokan wilayah penanganan non-kebakaran. Dinas Pemadam Kebakaran melakukan penanganan bukan hanya kebakaran saja namun ada banyak hal yang sebenarnya dapat ditangani oleh petugas pemadam kebakaran, kejadian non-kebakaran ada beberapa seperti evakuasi reptil, evakuasi kucing, penyelamatan korban kecelakaan dan lain sebagainya. Dari data non-kebakaran dari 16 kecamatan di Kota Semarang pada tahun 2019 akan dilakukan uji menggunakan tiga algoritma yaitu Single Lingkage, Average Linkage dan Complete Linkage . Adapun dari algoritma Single Linkage dilakukan prosedur pemusatan dari jarak terkecil antar objek data, algoritma Average Linkage dilakukan prosedur dari jarak rata-rata objek data, sedangkan jika algoritma Complete Linkage dilakukan prosedur pemusatan dari jarak yang terbesar. Implementasi dan visualiasi dari data uji coba yang dilakukan di penilitian ini menggunakan tools WEKA 3.8.4, Wakaito Environment Analysis for Knowledge atau yang biasa dikenal dengan WEKA ini merupakan software yang menggunakan bahasa pemrograman java. Dari dataset 380 data diambil sampel 100 data untuk diuji mengunakan WEKA menggunakan metode perhtungan jarak Manhattan Distance dengan 3 cluster. Hasil dari data uji coba dapat divisualisasikan dengan visualisasi dendogram pada fitur visualize tree dan jika dilakukan visualisasi dalam bentuk grafik dapat dilakukan menggunakan fitur visualize clusters assignment.

Download Full-text

Nearest Prototype and Nearest Neighbor Clustering with Twofold Memberships Based on Inductive Property

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2013.p0504 ◽

2013 ◽

Vol 17 (4) ◽

pp. 504-510

Author(s):

Satoshi Takumi ◽

◽

Sadaaki Miyamoto

Keyword(s):

Hierarchical Clustering ◽

Nearest Neighbor ◽

Classification Rules ◽

Agglomerative Hierarchical Clustering ◽

Single Linkage ◽

Natural Classification ◽

Nearest Neighbor Classification ◽

Voronoi Regions ◽

Inductive Property ◽

Neighbor Classification

The aim of this paper is to study methods of twofold membership clustering using the nearest prototype and nearest neighbor. The former uses theK-means, whereas the latter extends the single linkage in agglomerative hierarchical clustering. The concept of inductive clustering is moreover used for the both methods, which means that natural classification rules are derived as the results of clustering, a typical example of which is the Voronoi regions inK-means clustering. When the rule of nearest prototype allocation inK-means is replaced by nearest neighbor classification, we have inductive clustering related to the single linkage in agglomerative hierarchical clustering. The former method usesK-means or fuzzyc-means with noise clusters, whereby twofold memberships are derived; the latter method also derives two memberships in a different manner. Theoretical properties of the both methods are studied. Illustrative examples show implications and significances of this concept.

Download Full-text

Statistical estimation for Single Linkage Hierarchical Clustering

2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER) ◽

10.1109/cyber.2015.7288035 ◽

2015 ◽

Cited By ~ 1

Author(s):

Dekang Zhu ◽

Guralnik Dan ◽

Xuezhi Wang ◽

Xiang Li ◽

Bill Moran

Keyword(s):

Hierarchical Clustering ◽

Statistical Estimation ◽

Single Linkage

Download Full-text

On the Properties of α-Unchaining Single Linkage Hierarchical Clustering

Journal of Classification ◽

10.1007/s00357-016-9198-2 ◽

2016 ◽

Vol 33 (1) ◽

pp. 118-140 ◽

Cited By ~ 5

Author(s):

Alvaro Martínez-Pérez

Keyword(s):

Hierarchical Clustering ◽

Single Linkage

Download Full-text

A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise

Expert Systems with Applications ◽

10.1016/j.eswa.2019.03.031 ◽

2019 ◽

Vol 128 ◽

pp. 96-108 ◽

Cited By ~ 9

Author(s):

Frédéric Ros ◽

Serge Guillaume

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Single Linkage ◽

Hierarchical Clustering Algorithm

Download Full-text

A publicly accessible database for Clostridioides difficile genome sequences supports tracing of transmission chains and epidemics

Microbial Genomics ◽

10.1099/mgen.0.000410 ◽

2020 ◽

Vol 6 (8) ◽

Cited By ~ 5

Author(s):

Martinique Frentrup ◽

Zhemin Zhou ◽

Matthias Steglich ◽

Jan P. Meier-Kolthoff ◽

Markus Göker ◽

...

Keyword(s):

Hierarchical Clustering ◽

Type Species ◽

Global Scale ◽

Comparative Genomic ◽

Polymorphism Analysis ◽

Genome Sequences ◽

Single Linkage ◽

Content Type ◽

Link Type ◽

Clostridioides Difficile

Clostridioides difficile is the primary infectious cause of antibiotic-associated diarrhea. Local transmissions and international outbreaks of this pathogen have been previously elucidated by bacterial whole-genome sequencing, but comparative genomic analyses at the global scale were hampered by the lack of specific bioinformatic tools. Here we introduce a publicly accessible database within EnteroBase (http://enterobase.warwick.ac.uk) that automatically retrieves and assembles C. difficile short-reads from the public domain, and calls alleles for core-genome multilocus sequence typing (cgMLST). We demonstrate that comparable levels of resolution and precision are attained by EnteroBase cgMLST and single-nucleotide polymorphism analysis. EnteroBase currently contains 18 254 quality-controlled C. difficile genomes, which have been assigned to hierarchical sets of single-linkage clusters by cgMLST distances. This hierarchical clustering is used to identify and name populations of C. difficile at all epidemiological levels, from recent transmission chains through to epidemic and endemic strains. Moreover, it puts newly collected isolates into phylogenetic and epidemiological context by identifying related strains among all previously published genome data. For example, HC2 clusters (i.e. chains of genomes with pairwise distances of up to two cgMLST alleles) were statistically associated with specific hospitals (P<10−4) or single wards (P=0.01) within hospitals, indicating they represented local transmission clusters. We also detected several HC2 clusters spanning more than one hospital that by retrospective epidemiological analysis were confirmed to be associated with inter-hospital patient transfers. In contrast, clustering at level HC150 correlated with k-mer-based classification and was largely compatible with PCR ribotyping, thus enabling comparisons to earlier surveillance data. EnteroBase enables contextual interpretation of a growing collection of assembled, quality-controlled C. difficile genome sequences and their associated metadata. Hierarchical clustering rapidly identifies database entries that are related at multiple levels of genetic distance, facilitating communication among researchers, clinicians and public-health officials who are combatting disease caused by C. difficile .

Download Full-text

Peringkasan Tweet Berdasarkan Trending Topic Twitter Dengan Pembobotan TF-IDF dan Single Linkage AngglomerativeHierarchical Clustering

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v1i1.7 ◽

2016 ◽

Vol 1 (1) ◽

Author(s):

Annisa Annisa ◽

Yuda Munarko ◽

Yufis Azhar

Keyword(s):

Hierarchical Clustering ◽

Main Idea ◽

Single Linkage ◽

Human Expert ◽