hierarchical agglomerative clustering
Recently Published Documents


TOTAL DOCUMENTS

168
(FIVE YEARS 64)

H-INDEX

14
(FIVE YEARS 3)

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 267
Author(s):  
Félix Morales ◽  
Miguel García-Torres ◽  
Gustavo Velázquez ◽  
Federico Daumas-Ladouce ◽  
Pedro E. Gardel-Sotomayor ◽  
...  

Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters.


2021 ◽  
Author(s):  
Daniel Bakkelund

AbstractPartial orders and directed acyclic graphs are commonly recurring data structures that arise naturally in numerous domains and applications and are used to represent ordered relations between entities in the domains. Examples are task dependencies in a project plan, transaction order in distributed ledgers and execution sequences of tasks in computer programs, just to mention a few. We study the problem of order preserving hierarchical clustering of this kind of ordered data. That is, if we have $$a<b$$ a < b in the original data and denote their respective clusters by [a] and [b], then we shall have $$[a]<[b]$$ [ a ] < [ b ] in the produced clustering. The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage, and is an extension of classical hierarchical clustering. To achieve this, we develop a novel theory that extends classical hierarchical clustering to strictly partially ordered sets. We define the output from running classical hierarchical clustering on strictly ordered data to be partial dendrograms; sub-trees of classical dendrograms with several connected components. We then construct an embedding of partial dendrograms over a set into the family of ultrametrics over the same set. An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure, measured in the p-norm. Thus, the method is a combination of classical hierarchical clustering and ultrametric fitting. A reference implementation is employed for experiments on both synthetic random data and real world data from a database of machine parts. When compared to existing methods, the experiments show that our method excels both in cluster quality and order preservation.


2021 ◽  
Vol 11 (23) ◽  
pp. 11122
Author(s):  
Thomas Märzinger ◽  
Jan Kotík ◽  
Christoph Pfeifer

This paper is the result of the first-phase, inter-disciplinary work of a multi-disciplinary research project (“Urban pop-up housing environments and their potential as local innovation systems”) consisting of energy engineers and waste managers, landscape architects and spatial planners, innovation researchers and technology assessors. The project is aiming at globally analyzing and describing existing pop-up housings (PUH), developing modeling and assessment tools for sustainable, energy-efficient and socially innovative temporary housing solutions (THS), especially for sustainable and resilient urban structures. The present paper presents an effective application of hierarchical agglomerative clustering (HAC) for analyses of large datasets typically derived from field studies. As can be shown, the method, although well-known and successfully established in (soft) computing science, can also be used very constructively as a potential urban planning tool. The main aim of the underlying multi-disciplinary research project was to deeply analyze and structure THS and PUE. Multiple aspects are to be considered when it comes to the characterization and classification of such environments. A thorough (global) web survey of PUH and analysis of scientific literature concerning descriptive work of PUH and THS has been performed. Moreover, out of several tested different approaches and methods for classifying PUH, hierarchical clustering algorithms functioned well when properly selected metrics and cut-off criteria were applied. To be specific, the ‘Minkowski’-metric and the ‘Calinski-Harabasz’-criteria, as clustering indices, have shown the best overall results in clustering the inhomogeneous data concerning PUH. Several additional algorithms/functions derived from the field of hierarchical clustering have also been tested to exploit their potential in interpreting and graphically analyzing particular structures and dependencies in the resulting clusters. Hereby, (math.) the significance ‘S’ and (math.) proportion ‘P’ have been concluded to yield the best interpretable and comprehensible results when it comes to analyzing the given set (objects n = 85) of researched PUH-objects together with their properties (n > 190). The resulting easily readable graphs clearly demonstrate the applicability and usability of hierarchical clustering- and their derivative algorithms for scientifically profound building classification tasks in Urban Planning by effectively managing huge inhomogeneous building datasets.


2021 ◽  
Vol 2083 (4) ◽  
pp. 042022
Author(s):  
Zeming Wei ◽  
Chufeng Liang ◽  
Hua Tang

Abstract At present, municipal solid waste (MSW) collection is based on the divide-regional operation mode, which has many deficiencies. This paper proposes a cross-regional operation scheme. Through the initial assignment, type labeling, and reassignment, and use the improved hierarchical agglomerative clustering (IHAC) algorithm and garbage collecting route optimization (GCRO) algorithm to realize intelligent allocation of garbage and scheduling route planning of collection vehicles. The experimental results demonstrate the proposed scheme improves the utilization of vehicle resources, reduces the operating cost, realizes the balanced allocation of garbage, and solves the problems caused by limitations of the original operation scheme, which demonstrates the feasibility and effectiveness of the cross-regional operation.


2021 ◽  
Vol 2021 (1) ◽  
pp. 557-566
Author(s):  
Edy Widodo ◽  
Putri Ermayani ◽  
Latifah Nur Laila ◽  
Asdan Tri Madani

Sebagai permasalahan yang kompleks dan multidimensional, kemiskinan menjadi prioritas pembangunan. Setiap negara memiliki faktor penyebab kemiskinan yang beragam, diantaranya disebabkan oleh Sars-Cov-2 sepanjang tahun 2020. Pengaruhnya berupa kenaikan tingkat kemiskinan pada provinsi-provinsi di Indonesia. Sehingga Dalam usaha pengentasan kemiskinan dapat melakukan pengelompokkan tingkat kemiskinan provinsi di Indonesia untuk mengetahui provinsi yang layak mendapatkan prioritas penanganan. Penelitian terkait telah banyak dilakukan hanya berfokus pada suatu provinsi dan belum secara menyeluruh. Pada penelitian ini dilakukan analisis menggunakan seluruh provinsi di Indonesia serta indikator kemiskinan yang lebih lengkap yaitu indeks keparahan kemiskinan, indeks kedalaman kemiskinan, angka melek huruf, rata-rata lama sekolah, harapan lama sekolah, tingkat pengangguran terbuka, dan persentase penduduk miskin. Pengelompokkan dilakukan dengan metode analisis hierarchical agglomerative clustering. Hasil penelitian diperoleh 3 cluster tingkat kemiskinan yaitu cluster 1 merupakan tingkat rendah dengan anggota 25 provinsi, cluster 2 atau tingkat sedang sebanyak 7 provinsi, dan cluster 3 dengan tingkat tinggi sebanyak 2 anggota


2021 ◽  
Vol 23 (4) ◽  
pp. 0-0

Document management is a need for an era and managing documents in the regional languages is a significant and untouched area. Marathi corpus consisting of news is processed to form Group Entity document matrix Marathi (GEDMM), Vector space model for Marathi (VSMM) and Hysynset Vector space model for Marathi (HSVSMM). GEDMM uses entity group extracted using Condition random field (CRF). The frequent terms are used to construct VSMM using TF-IDF. HSVSMM uses synsets using hypernyms-hyponyms and synonyms. GEDMM and HSVSMM use dimension reduction by selecting significant feature groups. Hierarchical agglomerative clustering (HAC) is used and a dendrogram is produced to visualize the clusters. The performance analysis is carried out using several parameters like entropy, purity, misclassification error and accuracy. The clusters produced using GEDMM shows the minimum entropy and the highest purity. A random forest classifier is applied and the results are evaluated using misclassification error and accuracy.


2021 ◽  
Vol 23 (4) ◽  
pp. 1-13
Author(s):  
Jatinderkumar R. Saini ◽  
Prafulla Bharat Bafna

Document management is a need for an era and managing documents in the regional languages is a significant and untouched area. Marathi corpus consisting of news is processed to form Group Entity document matrix Marathi (GEDMM), Vector space model for Marathi (VSMM) and Hysynset Vector space model for Marathi (HSVSMM). GEDMM uses entity group extracted using Condition random field (CRF). The frequent terms are used to construct VSMM using TF-IDF. HSVSMM uses synsets using hypernyms-hyponyms and synonyms. GEDMM and HSVSMM use dimension reduction by selecting significant feature groups. Hierarchical agglomerative clustering (HAC) is used and a dendrogram is produced to visualize the clusters. The performance analysis is carried out using several parameters like entropy, purity, misclassification error and accuracy. The clusters produced using GEDMM shows the minimum entropy and the highest purity. A random forest classifier is applied and the results are evaluated using misclassification error and accuracy.


Author(s):  
Nicholas Monath ◽  
Kumar Avinava Dubey ◽  
Guru Guruganesh ◽  
Manzil Zaheer ◽  
Amr Ahmed ◽  
...  

2021 ◽  
Author(s):  
Liwei Chang ◽  
Alberto Perez ◽  
Ramon Alain Miranda-Quintana

We present new algorithms to classify structural ensembles of macromolecules, based on the recently proposed extended similarity measures. Molecular Dynamics provides a wealth of structural information on systems of biologically interest. As computer power increases we capture larger ensembles and larger conformational transitions between states. Typically, structural clustering provides the statistical mechanics treatment of the system to identify relevant biological states. The key advantage of our approach is that the newly introduced extended similiarity indices reduce the computational complexity of assessing the similarity of a set of structures from O(N2) to O(N). Here we take advantage of this favorable cost to develop several highly efficient techniques, including a linear-scaling algorithm to determine the medoid of a set (which we effectively use to select the most representative structure of a cluster). Moreover, we use our extended similarity indices as a linkage criterion in a novel hierarchical agglomerative clustering algorithm. We apply these new metrics to analyze the ensembles of several systems of biological interest such as folding and binding of macromolecules (peptide, protein, DNA-protein). In particular, we design a new workflow that is capable of identifying the most important conformations contributing to the protein folding process. We show excellent performance in the resulting clusters (surpassing traditional linkage criteria), along with faster performance and an efficient cost-function to identify when to merge clusters.


Sign in / Sign up

Export Citation Format

Share Document