Scalable Hierarchical Agglomerative Clustering

The scientific community is active in developing new models and methods to help reach the ambitious target set by UN SDGs7: universal access to electricity by 2030. Efficient planning of distribution networks is a complex and multivariate task, which is usually split into multiple subproblems to reduce the number of variables. The present work addresses the problem of optimal secondary substation siting, by means of different clustering techniques. In contrast with the majority of approaches found in the literature, which are devoted to the planning of MV grids in already electrified urban areas, this work focuses on greenfield planning in rural areas. K-means algorithm, hierarchical agglomerative clustering, and a method based on optimal weighted tree partitioning are adapted to the problem and run on two real case studies, with different population densities. The algorithms are compared in terms of different indicators useful to assess the feasibility of the solutions found. The algorithms have proven to be effective in addressing some of the crucial aspects of substations siting and to constitute relevant improvements to the classic K-means approach found in the literature. However, it is found that it is very challenging to conjugate an acceptable geographical span of the area served by a single substation with a substation power high enough to justify the installation when the load density is very low. In other words, well known standards adopted in industrialized countries do not fit with developing countries’ requirements.

Download Full-text

Identifying organ dysfunction trajectory-based subphenotypes in critically ill patients with COVID-19

Scientific Reports ◽

10.1038/s41598-021-95431-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chang Su ◽

Zhenxing Xu ◽

Katherine Hoffman ◽

Parag Goyal ◽

Monika M. Safford ◽

...

Keyword(s):

New York ◽

Respiratory Failure ◽

Sofa Score ◽

Severity Of Illness ◽

Agglomerative Clustering ◽

Baseline Severity ◽

Organ Systems ◽

Hierarchical Agglomerative Clustering ◽

Dynamic Time ◽

Post Intubation

AbstractCOVID-19-associated respiratory failure offers the unprecedented opportunity to evaluate the differential host response to a uniform pathogenic insult. Understanding whether there are distinct subphenotypes of severe COVID-19 may offer insight into its pathophysiology. Sequential Organ Failure Assessment (SOFA) score is an objective and comprehensive measurement that measures dysfunction severity of six organ systems, i.e., cardiovascular, central nervous system, coagulation, liver, renal, and respiration. Our aim was to identify and characterize distinct subphenotypes of COVID-19 critical illness defined by the post-intubation trajectory of SOFA score. Intubated COVID-19 patients at two hospitals in New York city were leveraged as development and validation cohorts. Patients were grouped into mild, intermediate, and severe strata by their baseline post-intubation SOFA. Hierarchical agglomerative clustering was performed within each stratum to detect subphenotypes based on similarities amongst SOFA score trajectories evaluated by Dynamic Time Warping. Distinct worsening and recovering subphenotypes were identified within each stratum, which had distinct 7-day post-intubation SOFA progression trends. Patients in the worsening suphenotypes had a higher mortality than those in the recovering subphenotypes within each stratum (mild stratum, 29.7% vs. 10.3%, p = 0.033; intermediate stratum, 29.3% vs. 8.0%, p = 0.002; severe stratum, 53.7% vs. 22.2%, p < 0.001). Pathophysiologic biomarkers associated with progression were distinct at each stratum, including findings suggestive of inflammation in low baseline severity of illness versus hemophagocytic lymphohistiocytosis in higher baseline severity of illness. The findings suggest that there are clear worsening and recovering subphenotypes of COVID-19 respiratory failure after intubation, which are more predictive of outcomes than baseline severity of illness. Distinct progression biomarkers at differential baseline severity of illness suggests a heterogeneous pathobiology in the progression of COVID-19 respiratory failure.

Download Full-text

Hierarchical Agglomerative Clustering

Encyclopedia of Systems Biology ◽

10.1007/978-1-4419-9863-7_1371 ◽

2013 ◽

pp. 886-887 ◽

Cited By ~ 28

Author(s):

Marie Lisandra Zepeda-Mendoza ◽

Osbaldo Resendis-Antonio

Keyword(s):

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering

Download Full-text

An Approach for Fast Hierarchical Agglomerative Clustering Using Graphics Processors with CUDA

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13672-6_4 ◽

2010 ◽

pp. 35-42 ◽

Cited By ~ 4

Author(s):

S. A. Arul Shalom ◽

Manoranjan Dash ◽

Minh Tue

Keyword(s):

Graphics Processors ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering

Download Full-text

Chromatographic, Chemometric and Antioxidant Assessment of the Equivalence of Granules and Herbal Materials of Angelicae Sinensis Radix

Medicines ◽

10.3390/medicines7060035 ◽

2020 ◽

Vol 7 (6) ◽

pp. 35

Author(s):

Valentina Razmovski-Naumovski ◽

Xian Zhou ◽

Ho Yee Wong ◽

Antony Kam ◽

Jarryd Pearson ◽

...

Keyword(s):

Ferulic Acid ◽

Caffeic Acid ◽

Radical Scavenging ◽

Principal Component ◽

Ultra Performance Liquid Chromatography ◽

Array Detector ◽

Agglomerative Clustering ◽

Antioxidant Power ◽

Hierarchical Agglomerative Clustering ◽

Angelicae Sinensis

Background: Granules are a popular way of administrating herbal decoctions. However, there are no standardised quality control methods for granules, with few studies comparing the granules to traditional herbal decoctions. This study developed a multi-analytical platform to compare the quality of granule products to herb/decoction pieces of Angelicae Sinensis Radix (Danggui). Methods: A validated ultra-performance liquid chromatography coupled with photodiode array detector (UPLC-PDA) method quantitatively compared the aqueous extracts. Hierarchical agglomerative clustering analysis (HCA) and principal component analysis (PCA) clustered the samples according to three chemical compounds: ferulic acid, caffeic acid and Z-ligustilide. Ferric ion-reducing antioxidant power (FRAP) and 2,2-Diphenyl-1-picrylhydrazyl radical scavenging capacity (DPPH) assessed the antioxidant activity of the samples. Results: HCA and PCA allocated the samples into two main groups: granule products and herb/decoction pieces. Greater differentiation between the samples was obtained with three chemical markers compared to using one marker. The herb/decoction pieces group showed comparatively higher extraction yields and significantly higher DPPH and FRAP (p < 0.05), which was positively correlated to caffeic acid and ferulic acid, respectively. Conclusions: The results confirm the need for the quality assessment of granule products using more than one chemical marker for widespread practitioner and consumer use.

Download Full-text

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics

Algorithms for Molecular Biology ◽

10.1186/s13015-019-0157-4 ◽

2019 ◽

Vol 14 (1) ◽

Cited By ~ 1

Author(s):

Christophe Ambroise ◽

Alia Dehman ◽

Pierre Neuvial ◽

Guillem Rigaill ◽

Nathalie Vialaneix

Keyword(s):

Linear Complexity ◽

Association Studies ◽

Small Time ◽

Genome Wide Association Studies ◽

Similarity Matrix ◽

Agglomerative Clustering ◽

Genome Wide ◽

Sample Data ◽

Hierarchical Agglomerative Clustering ◽

Time And Space Complexity

Abstract Background Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of $$10^4$$104 to $$10^5$$105 for each chromosome. Results By assuming that the similarity between physically distant objects is negligible, we are able to propose an implementation of adjacency-constrained HAC with quasi-linear complexity. This is achieved by pre-calculating specific sums of similarities, and storing candidate fusions in a min-heap. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds. Availability and implementation Software and sample data are available as an package, adjclust, that can be downloaded from the Comprehensive R Archive Network (CRAN).

Download Full-text

A faster estimation method for the probability of informed trading using hierarchical agglomerative clustering

Quantitative Finance ◽

10.1080/14697688.2015.1023336 ◽

2015 ◽

Vol 15 (11) ◽

pp. 1805-1821 ◽

Cited By ~ 10

Author(s):

Quan Gan ◽

Wang Chun Wei ◽

David Johnstone

Keyword(s):

Estimation Method ◽

Informed Trading ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering ◽

Probability Of Informed Trading

Download Full-text

Application and Evaluation of Hierarchical Agglomerative Clustering in Wireless Sensor Networks

Sensor and Ad-Hoc Networks - Lecture Notes in Electrical Engineering ◽

10.1007/978-0-387-77320-9_13 ◽

2008 ◽

pp. 255-276 ◽

Cited By ~ 1

Author(s):

Chenjuan Zhou ◽

Chung-Horng Lung

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Wireless Sensor ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering

Download Full-text

Radar Emission Sources Identification Based on Hierarchical Agglomerative Clustering for Large Data Sets

Journal of Sensors ◽

10.1155/2016/1879327 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 21

Author(s):

Janusz Dudczyk

Keyword(s):

Clustering Algorithm ◽

Large Data ◽

Large Data Sets ◽

Emission Sources ◽

Data Sets ◽

Agglomerative Clustering ◽

Distinctive Features ◽

Identification Process ◽

Hierarchical Agglomerative Clustering ◽

Repetition Interval

More advanced recognition methods, which may recognize particular copies of radars of the same type, are called identification. The identification process of radar devices is a more specialized task which requires methods based on the analysis of distinctive features. These features are distinguished from the signals coming from the identified devices. Such a process is called Specific Emitter Identification (SEI). The identification of radar emission sources with the use of classic techniques based on the statistical analysis of basic measurable parameters of a signal such as Radio Frequency, Amplitude, Pulse Width, or Pulse Repetition Interval is not sufficient for SEI problems. This paper presents the method of hierarchical data clustering which is used in the process of radar identification. The Hierarchical Agglomerative Clustering Algorithm (HACA) based on Generalized Agglomerative Scheme (GAS) implemented and used in the research method is parameterized; therefore, it is possible to compare the results. The results of clustering are presented in dendrograms in this paper. The received results of grouping and identification based on HACA are compared with other SEI methods in order to assess the degree of their usefulness and effectiveness for systems of ESM/ELINT class.

Download Full-text