E-Focused Crawler and Hierarchical Agglomerative Clustering Approach for Automated Categorization of Feature-Level Healthcare Sentiments on Social Media

Purpose Service-oriented architecture is an emerging software architecture, in which web service (WS) plays a crucial role. In this architecture, the task of WS composition and verification is required when handling complex requirement of services from users. When the number of WS becomes very huge in practice, the complexity of the composition and verification is also correspondingly high. In this paper, the authors aim to propose a logic-based clustering approach to solve this problem by separating the original repository of WS into clusters. Moreover, they also propose a so-called quality-controlled clustering approach to ensure the quality of generated clusters in a reasonable execution time. Design/methodology/approach The approach represents WSs as logical formulas on which the authors conduct the clustering task. They also combine two most popular clustering approaches of hierarchical agglomerative clustering (HAC) and k-means to ensure the quality of generated clusters. Findings This logic-based clustering approach really helps to increase the performance of the WS composition and verification significantly. Furthermore, the logic-based approach helps us to maintain the soundness and completeness of the composition solution. Eventually, the quality-controlled strategy can ensure the quality of generated clusters in low complexity time. Research limitations/implications The work discussed in this paper is just implemented as a research tool known as WSCOVER. More work is needed to make it a practical and usable system for real life applications. Originality/value In this paper, the authors propose a logic-based paradigm to represent and cluster WSs. Moreover, they also propose an approach of quality-controlled clustering which combines and takes advantages of two most popular clustering approaches of HAC and k-means.

Download Full-text

PerioClust: A Simple Hierarchical Agglomerative Clustering Approach Including Constraints

Data Analysis and Rationality in a Complex World - Studies in Classification, Data Analysis, and Knowledge Organization ◽

10.1007/978-3-030-60104-1_1 ◽

2021 ◽

pp. 1-8

Author(s):

Lise Bellanger ◽

Arthur Coulon ◽

Philippe Husi

Keyword(s):

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering ◽

Clustering Approach

Download Full-text

Hierarchical Agglomerative Clustering approach for Automated Attribute Classification of the Health Care Domain from User Generated Reviews on Web 2.0

2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON) ◽

10.1109/gucon48875.2020.9231122 ◽

2020 ◽

Author(s):

Saroj Kushwaha ◽

Sanjoy Das

Keyword(s):

Health Care ◽

Web 2.0 ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering ◽

Attribute Classification ◽

Clustering Approach

Download Full-text

Clustering Techniques for Secondary Substations Siting

Energies ◽

10.3390/en14041028 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1028

Author(s):

Silvia Corigliano ◽

Federico Rosato ◽

Carla Ortiz Dominguez ◽

Marco Merlo

Keyword(s):

Rural Areas ◽

Urban Areas ◽

Universal Access ◽

Distribution Networks ◽

Industrialized Countries ◽

Agglomerative Clustering ◽

Clustering Techniques ◽

Hierarchical Agglomerative Clustering ◽

Efficient Planning ◽

Target Set

The scientific community is active in developing new models and methods to help reach the ambitious target set by UN SDGs7: universal access to electricity by 2030. Efficient planning of distribution networks is a complex and multivariate task, which is usually split into multiple subproblems to reduce the number of variables. The present work addresses the problem of optimal secondary substation siting, by means of different clustering techniques. In contrast with the majority of approaches found in the literature, which are devoted to the planning of MV grids in already electrified urban areas, this work focuses on greenfield planning in rural areas. K-means algorithm, hierarchical agglomerative clustering, and a method based on optimal weighted tree partitioning are adapted to the problem and run on two real case studies, with different population densities. The algorithms are compared in terms of different indicators useful to assess the feasibility of the solutions found. The algorithms have proven to be effective in addressing some of the crucial aspects of substations siting and to constitute relevant improvements to the classic K-means approach found in the literature. However, it is found that it is very challenging to conjugate an acceptable geographical span of the area served by a single substation with a substation power high enough to justify the installation when the load density is very low. In other words, well known standards adopted in industrialized countries do not fit with developing countries’ requirements.

Download Full-text

Identifying organ dysfunction trajectory-based subphenotypes in critically ill patients with COVID-19

Scientific Reports ◽

10.1038/s41598-021-95431-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chang Su ◽

Zhenxing Xu ◽

Katherine Hoffman ◽

Parag Goyal ◽

Monika M. Safford ◽

...

Keyword(s):

New York ◽

Respiratory Failure ◽

Sofa Score ◽

Severity Of Illness ◽

Agglomerative Clustering ◽

Baseline Severity ◽

Organ Systems ◽

Hierarchical Agglomerative Clustering ◽

Dynamic Time ◽

Post Intubation

AbstractCOVID-19-associated respiratory failure offers the unprecedented opportunity to evaluate the differential host response to a uniform pathogenic insult. Understanding whether there are distinct subphenotypes of severe COVID-19 may offer insight into its pathophysiology. Sequential Organ Failure Assessment (SOFA) score is an objective and comprehensive measurement that measures dysfunction severity of six organ systems, i.e., cardiovascular, central nervous system, coagulation, liver, renal, and respiration. Our aim was to identify and characterize distinct subphenotypes of COVID-19 critical illness defined by the post-intubation trajectory of SOFA score. Intubated COVID-19 patients at two hospitals in New York city were leveraged as development and validation cohorts. Patients were grouped into mild, intermediate, and severe strata by their baseline post-intubation SOFA. Hierarchical agglomerative clustering was performed within each stratum to detect subphenotypes based on similarities amongst SOFA score trajectories evaluated by Dynamic Time Warping. Distinct worsening and recovering subphenotypes were identified within each stratum, which had distinct 7-day post-intubation SOFA progression trends. Patients in the worsening suphenotypes had a higher mortality than those in the recovering subphenotypes within each stratum (mild stratum, 29.7% vs. 10.3%, p = 0.033; intermediate stratum, 29.3% vs. 8.0%, p = 0.002; severe stratum, 53.7% vs. 22.2%, p < 0.001). Pathophysiologic biomarkers associated with progression were distinct at each stratum, including findings suggestive of inflammation in low baseline severity of illness versus hemophagocytic lymphohistiocytosis in higher baseline severity of illness. The findings suggest that there are clear worsening and recovering subphenotypes of COVID-19 respiratory failure after intubation, which are more predictive of outcomes than baseline severity of illness. Distinct progression biomarkers at differential baseline severity of illness suggests a heterogeneous pathobiology in the progression of COVID-19 respiratory failure.

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text

Hierarchical Agglomerative Clustering

Encyclopedia of Systems Biology ◽

10.1007/978-1-4419-9863-7_1371 ◽

2013 ◽

pp. 886-887 ◽

Cited By ~ 28

Author(s):

Marie Lisandra Zepeda-Mendoza ◽

Osbaldo Resendis-Antonio

Keyword(s):

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering

Download Full-text

Supporting Personalized Health Care With Social Media Analytics: An Application to Hypothyroidism

ACM Transactions on Computing for Healthcare ◽

10.1145/3468781 ◽

2022 ◽

Vol 3 (1) ◽

pp. 1-28

Author(s):

Giorgio Grani ◽

Andrea Lenzi ◽

Paola Velardi

Keyword(s):

Social Media ◽

Data Extraction ◽

Social Media Analytics ◽

Text Compression ◽

Emotional States ◽

Agglomerative Clustering ◽

Detection Model ◽

Analytic Process ◽

Personalized Health ◽

Personalized Health Care

Social media analytics can considerably contribute to understanding health conditions beyond clinical practice, by capturing patients’ discussions and feelings about their quality of life in relation to disease treatments. In this article, we propose a methodology to support a detailed analysis of the therapeutic experience in patients affected by a specific disease, as it emerges from health forums. As a use case to test the proposed methodology, we analyze the experience of patients affected by hypothyroidism and their reactions to standard therapies. Our approach is based on a data extraction and filtering pipeline, a novel topic detection model named Generative Text Compression with Agglomerative Clustering Summarization ( GTCACS ), and an in-depth data analytic process. We advance the state of the art on automated detection of adverse drug reactions ( ADRs ) since, rather than simply detecting and classifying positive or negative reactions to a therapy, we are capable of providing a fine characterization of patients along different dimensions, such as co-morbidities, symptoms, and emotional states.

Download Full-text

An Approach for Fast Hierarchical Agglomerative Clustering Using Graphics Processors with CUDA

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13672-6_4 ◽

2010 ◽

pp. 35-42 ◽

Cited By ~ 4

Author(s):

S. A. Arul Shalom ◽

Manoranjan Dash ◽

Minh Tue

Keyword(s):

Graphics Processors ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering

Download Full-text

Chromatographic, Chemometric and Antioxidant Assessment of the Equivalence of Granules and Herbal Materials of Angelicae Sinensis Radix

Medicines ◽

10.3390/medicines7060035 ◽

2020 ◽

Vol 7 (6) ◽

pp. 35

Author(s):

Valentina Razmovski-Naumovski ◽

Xian Zhou ◽

Ho Yee Wong ◽

Antony Kam ◽

Jarryd Pearson ◽

...

Keyword(s):

Ferulic Acid ◽

Caffeic Acid ◽

Radical Scavenging ◽

Principal Component ◽

Ultra Performance Liquid Chromatography ◽

Array Detector ◽

Agglomerative Clustering ◽

Antioxidant Power ◽

Hierarchical Agglomerative Clustering ◽

Angelicae Sinensis

Background: Granules are a popular way of administrating herbal decoctions. However, there are no standardised quality control methods for granules, with few studies comparing the granules to traditional herbal decoctions. This study developed a multi-analytical platform to compare the quality of granule products to herb/decoction pieces of Angelicae Sinensis Radix (Danggui). Methods: A validated ultra-performance liquid chromatography coupled with photodiode array detector (UPLC-PDA) method quantitatively compared the aqueous extracts. Hierarchical agglomerative clustering analysis (HCA) and principal component analysis (PCA) clustered the samples according to three chemical compounds: ferulic acid, caffeic acid and Z-ligustilide. Ferric ion-reducing antioxidant power (FRAP) and 2,2-Diphenyl-1-picrylhydrazyl radical scavenging capacity (DPPH) assessed the antioxidant activity of the samples. Results: HCA and PCA allocated the samples into two main groups: granule products and herb/decoction pieces. Greater differentiation between the samples was obtained with three chemical markers compared to using one marker. The herb/decoction pieces group showed comparatively higher extraction yields and significantly higher DPPH and FRAP (p < 0.05), which was positively correlated to caffeic acid and ferulic acid, respectively. Conclusions: The results confirm the need for the quality assessment of granule products using more than one chemical marker for widespread practitioner and consumer use.

Download Full-text