agglomerative clustering
Recently Published Documents


TOTAL DOCUMENTS

482
(FIVE YEARS 185)

H-INDEX

27
(FIVE YEARS 5)

2022 ◽  
Vol 3 (1) ◽  
pp. 1-28
Author(s):  
Giorgio Grani ◽  
Andrea Lenzi ◽  
Paola Velardi

Social media analytics can considerably contribute to understanding health conditions beyond clinical practice, by capturing patients’ discussions and feelings about their quality of life in relation to disease treatments. In this article, we propose a methodology to support a detailed analysis of the therapeutic experience in patients affected by a specific disease, as it emerges from health forums. As a use case to test the proposed methodology, we analyze the experience of patients affected by hypothyroidism and their reactions to standard therapies. Our approach is based on a data extraction and filtering pipeline, a novel topic detection model named Generative Text Compression with Agglomerative Clustering Summarization ( GTCACS ), and an in-depth data analytic process. We advance the state of the art on automated detection of adverse drug reactions ( ADRs ) since, rather than simply detecting and classifying positive or negative reactions to a therapy, we are capable of providing a fine characterization of patients along different dimensions, such as co-morbidities, symptoms, and emotional states.


Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 267
Author(s):  
Félix Morales ◽  
Miguel García-Torres ◽  
Gustavo Velázquez ◽  
Federico Daumas-Ladouce ◽  
Pedro E. Gardel-Sotomayor ◽  
...  

Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters.


Author(s):  
Sowmya HK ◽  
R. J. Anandhi

The WWW has a big number of pages and URLs that supply the user with a great amount of content. In an intensifying epoch of information, analysing users browsing behaviour is a significant affair. Web usage mining techniques are applied to the web server log to analyse the user behaviour. Identification of user sessions is one of the key and demanding tasks in the pre-processing stage of web usage mining. This paper emphasizes on two important fallouts with the approaches used in the existing session identification methods such as Time based and Referrer based sessionization. The first is dealing with comparing of current request’s referrer field with the URL of previous request. The second is dealing with session creation, new sessions are created or comes in to one session due to threshold value of page stay time and session time. So, authors developed enhanced semantic distance based session identification algorithm that tackles above mentioned issues of traditional session identification methods. The enhanced semantic based method has an accuracy of 84 percent, which is higher than the Time based and Time-Referrer based session identification approaches. The authors also used adapted K-Means and Hierarchical Agglomerative clustering algorithms to improve the prediction of user browsing patterns. Clusters were found using a weighted dissimilarity matrix, which is calculated using two key parameters: page weight and session weight. The Dunn Index and Davies-Bouldin Index are then used to evaluate the clusters. Experimental results shows that more pure and accurate session clusters are formed when adapted clustering algorithms are applied on the weighted sessions rather than the session obtained from traditional sessionization algorithms. Accuracy of the semantic session cluster is higher compared with the cluster of sessions obtained using traditional sessionization.


2021 ◽  
Author(s):  
Daniel Bakkelund

AbstractPartial orders and directed acyclic graphs are commonly recurring data structures that arise naturally in numerous domains and applications and are used to represent ordered relations between entities in the domains. Examples are task dependencies in a project plan, transaction order in distributed ledgers and execution sequences of tasks in computer programs, just to mention a few. We study the problem of order preserving hierarchical clustering of this kind of ordered data. That is, if we have $$a<b$$ a < b in the original data and denote their respective clusters by [a] and [b], then we shall have $$[a]<[b]$$ [ a ] < [ b ] in the produced clustering. The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage, and is an extension of classical hierarchical clustering. To achieve this, we develop a novel theory that extends classical hierarchical clustering to strictly partially ordered sets. We define the output from running classical hierarchical clustering on strictly ordered data to be partial dendrograms; sub-trees of classical dendrograms with several connected components. We then construct an embedding of partial dendrograms over a set into the family of ultrametrics over the same set. An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure, measured in the p-norm. Thus, the method is a combination of classical hierarchical clustering and ultrametric fitting. A reference implementation is employed for experiments on both synthetic random data and real world data from a database of machine parts. When compared to existing methods, the experiments show that our method excels both in cluster quality and order preservation.


2021 ◽  
Vol 11 (23) ◽  
pp. 11372
Author(s):  
Ataur Rahman ◽  
Nasrullah Khan ◽  
Kishwar Ali ◽  
Rafi Ullah ◽  
Muhammad Ezaz Hasan Khan ◽  
...  

The forest ecosystem has understory vegetation that plays a vital role in sustaining diversity, providing nutrients, and forming a useful association for developing a balanced ecosystem. The current study provides detailed insights into the plant biodiversity and species classification of the understory vegetation of Swat, Pakistan. The floral diversity of the area was comprised of 58 plant species belonging to 32 families. The physiognomy of the studied area was dominated by herbaceous growth form with 47 species. The dominant life-form class was hemicryptophytes with 19 species (33%), followed by nanophanerophytes with 15 species (26%) and therophytes with 13 species (22%). Of the 58 species, 43 plant species were associated with group III clustered by applying Ward’s agglomerative clustering that indicated wide sociability of the species in the studied oak-dominated forests. Group III had higher species richness (10.3), α-diversity (2.74) and β-diversity (9.85), and Margalef index values (3.95). While the group I had maximum Pielous and Simpson index values of 0.97 and 7.13, respectively. Redundancy analysis revealed that seven variables (i.e., latitude, elevation, clay, wilting point, bulk density, saturation, and electric conductivity) were significantly influential concerning the understory vegetation of oak-dominated forests. The understory vegetation of these forests plays an important role in the forest ecosystem of the region. The present study reveals floral divergence and physiognomic scenario of the unexplored study area, which could be an important reference for future ethnobotanical, phytosociological, and conservational endeavors. Moreover, this information is important to the success of efforts intended to prevent the loss of species diversity in these forests by destroying their natural habitats.


2021 ◽  
Author(s):  
Mateus Alex dos Santos Luna ◽  
André Paulino Lima ◽  
Thaís Rodrigues Neubauer ◽  
Marcelo Fantinato ◽  
Sarajane Marques Peres

Process mining explores event logs to offer valuable insights to business process managers. Some types of business processes are hard to mine, including unstructured and knowledge-intensive processes. Then, trace clustering is usually applied to event logs aiming to break it into sublogs, making it more amenable to the typical process mining task. However, applying clustering algorithms involves decisions, such as how traces are represented, that can lead to better results. In this paper, we compare four vector space models for trace clustering, using them with an agglomerative clustering algorithm in synthetic and real-world event logs. Our analyses suggest the embeddings-based vector space model can properly handle trace clustering in unstructured processes.


2021 ◽  
Vol 13 (23) ◽  
pp. 13046
Author(s):  
Philipp A. Friese ◽  
Wibke Michalk ◽  
Markus Fischer ◽  
Cornelius Hardt ◽  
Klaus Bogenberger

This study presents an approach to collect and classify usage data of public charging infrastructure in order to predict usage based on socio-demographic data within a city. The approach comprises data acquisition and a two-step machine learning approach, classifying and predicting usage behavior. Data is acquired by gathering information on charging points from publicly available sources. The first machine learning step identifies four relevant usage patterns from the gathered data using an agglomerative clustering approach. The second step utilizes a Random Forest Classification to predict usage patterns from socio-demographic factors in a spatial context. This approach allows to predict usage behavior at locations for potential new charging points. Applying the presented approach to Munich, a large city in Germany, results confirm the adaptability in complex urban environments. Visualizing the spatial distribution of the predicted usage patterns shows the prevalence of different patterns throughout the city. The presented approach helps municipalities and charging infrastructure operators to identify areas with certain usage patterns and, hence different technical requirements, to optimize the charging infrastructure in order to help meeting the increasing demand of electric mobility.


2021 ◽  
Vol 11 (23) ◽  
pp. 11122
Author(s):  
Thomas Märzinger ◽  
Jan Kotík ◽  
Christoph Pfeifer

This paper is the result of the first-phase, inter-disciplinary work of a multi-disciplinary research project (“Urban pop-up housing environments and their potential as local innovation systems”) consisting of energy engineers and waste managers, landscape architects and spatial planners, innovation researchers and technology assessors. The project is aiming at globally analyzing and describing existing pop-up housings (PUH), developing modeling and assessment tools for sustainable, energy-efficient and socially innovative temporary housing solutions (THS), especially for sustainable and resilient urban structures. The present paper presents an effective application of hierarchical agglomerative clustering (HAC) for analyses of large datasets typically derived from field studies. As can be shown, the method, although well-known and successfully established in (soft) computing science, can also be used very constructively as a potential urban planning tool. The main aim of the underlying multi-disciplinary research project was to deeply analyze and structure THS and PUE. Multiple aspects are to be considered when it comes to the characterization and classification of such environments. A thorough (global) web survey of PUH and analysis of scientific literature concerning descriptive work of PUH and THS has been performed. Moreover, out of several tested different approaches and methods for classifying PUH, hierarchical clustering algorithms functioned well when properly selected metrics and cut-off criteria were applied. To be specific, the ‘Minkowski’-metric and the ‘Calinski-Harabasz’-criteria, as clustering indices, have shown the best overall results in clustering the inhomogeneous data concerning PUH. Several additional algorithms/functions derived from the field of hierarchical clustering have also been tested to exploit their potential in interpreting and graphically analyzing particular structures and dependencies in the resulting clusters. Hereby, (math.) the significance ‘S’ and (math.) proportion ‘P’ have been concluded to yield the best interpretable and comprehensible results when it comes to analyzing the given set (objects n = 85) of researched PUH-objects together with their properties (n > 190). The resulting easily readable graphs clearly demonstrate the applicability and usability of hierarchical clustering- and their derivative algorithms for scientifically profound building classification tasks in Urban Planning by effectively managing huge inhomogeneous building datasets.


Sign in / Sign up

Export Citation Format

Share Document