Clustering of a Health Dataset Using Diagnosis Co-Occurrences

Assessing the health profiles of populations is a crucial task to create a coherent healthcare offer. Emergency Departments (EDs) are at the core of the healthcare system and could benefit from this evaluation via an improved understanding of the healthcare needs of their population. This paper proposes a novel hierarchical agglomerative clustering algorithm based on multimorbidity analysis. The proposed approach constructs the clustering dendrogram by introducing new quality indicators based on the relative risk of co-occurrences of patient diagnoses. This algorithm enables the detection of multimorbidity patterns by merging similar patient profiles according to their common diagnoses. The multimorbidity approach has been applied to the data of the largest ED of the Aube Department (Eastern France) to cluster its patient visits. Among the 120,718 visits identified during a 24-month period, 16 clusters were identified, accounting for 94.8% of the visits, with the five most prevalent clusters representing 63.0% of them. The new quality indicators show a coherent and good clustering solution with a cluster membership of 1.81 based on a cluster compactness of 1.40 and a cluster separation of 0.77. Compared to the literature, the proposed approach is appropriate for the discovery of multimorbidity patterns and could help to develop better clustering algorithms for more diverse healthcare datasets.

Download Full-text

Clustering of Micro-Messages Using Similarity Upper Approximation

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517500039 ◽

2017 ◽

Vol 25 (01) ◽

pp. 53-79 ◽

Cited By ~ 1

Author(s):

Mukul Gupta ◽

Pradeep Kumar ◽

Bharat Bhasker

Keyword(s):

Clustering Algorithm ◽

State Of The Art ◽

Clustering Algorithms ◽

Mining Operation ◽

Agglomerative Clustering ◽

Upper Approximation ◽

Affinity Propagation Clustering ◽

Hierarchical Agglomerative Clustering ◽

Text Content ◽

Extract Information

Microblogging platforms like Twitter, Tumblr and Plurk have radically changed our lives. The presence of millions of people has made these platforms a preferred channel for communication. A large amount of User Generated Content, on these platforms, has attracted researchers and practitioners to mine and extract information nuggets. For information extraction, clustering is an important and widely used mining operation. This paper addresses the issue of clustering of micro-messages and corresponding users based on the text content of micro-messages that reflect their primitive interest. In this paper, we performed modification of the Similarity Upper Approximation based clustering algorithm for clustering of micro-messages. We compared the performance of the modified Similarity Upper Approximation based clustering algorithm with state-of-the-art clustering algorithms such as Partition Around Medoids, Hierarchical Agglomerative Clustering, Affinity Propagation Clustering and DBSCAN. Experiments were performed on micro-messages collected from Twitter. Experimental results show the effectiveness of the proposed algorithm.

Download Full-text

A Hierarchical Clustering Algorithm Based on Silhouette Index for Cancer Subtype Discovery from Omics Data

10.1101/309716 ◽

2018 ◽

Cited By ~ 1

Author(s):

N. Nidheesh ◽

K.A. Abdul Nazeer ◽

P.M. Ameer

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Agglomerative Clustering ◽

Cluster Validity ◽

Cluster Validity Index ◽

Number Of Clusters ◽

Silhouette Index ◽

Cancer Subtype ◽

Hierarchical Agglomerative Clustering ◽

Hierarchical Clustering Algorithm

AbstractCancer subtype discovery fromomicsdata requires techniques to estimate the number of natural clusters in the data. Automatically estimating the number of clusters has been a challenging problem in Machine Learning. Using clustering algorithms together with internal cluster validity indexes have been a popular method of estimating the number of clusters in biomolecular data. We propose a Hierarchical Agglomerative Clustering algorithm, namedSilHAC, which can automatically estimate the number of natural clusters and can find the associated clustering solution.SilHACis parameterless. We also present two hybrids ofSilHACwithSpectral ClusteringandK-Meansrespectively as components.SilHACand the hybrids could find reasonable estimates for the number of clusters and the associated clustering solution when applied to a collection of cancer gene expression datasets. The proposed methods are better alternatives to the ‘clustering algorithm - internal cluster validity index’ pipelines for estimating the number of natural clusters.

Download Full-text

A Hard C-Means Clustering Algorithm Incorporating Membership KL Divergence and Local Data Information for Noisy Image Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141850012x ◽

2017 ◽

Vol 32 (04) ◽

pp. 1850012 ◽

Cited By ~ 5

Author(s):

R. R. Gharieb ◽

G. Gendy ◽

H. Selim

Keyword(s):

Image Segmentation ◽

Membership Function ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Local Data ◽

Cluster Membership ◽

Kl Divergence ◽

Clustering Approach ◽

Center Distance

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.

Download Full-text

Radar Emission Sources Identification Based on Hierarchical Agglomerative Clustering for Large Data Sets

Journal of Sensors ◽

10.1155/2016/1879327 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 21

Author(s):

Janusz Dudczyk

Keyword(s):

Clustering Algorithm ◽

Large Data ◽

Large Data Sets ◽

Emission Sources ◽

Data Sets ◽

Agglomerative Clustering ◽

Distinctive Features ◽

Identification Process ◽

Hierarchical Agglomerative Clustering ◽

Repetition Interval

More advanced recognition methods, which may recognize particular copies of radars of the same type, are called identification. The identification process of radar devices is a more specialized task which requires methods based on the analysis of distinctive features. These features are distinguished from the signals coming from the identified devices. Such a process is called Specific Emitter Identification (SEI). The identification of radar emission sources with the use of classic techniques based on the statistical analysis of basic measurable parameters of a signal such as Radio Frequency, Amplitude, Pulse Width, or Pulse Repetition Interval is not sufficient for SEI problems. This paper presents the method of hierarchical data clustering which is used in the process of radar identification. The Hierarchical Agglomerative Clustering Algorithm (HACA) based on Generalized Agglomerative Scheme (GAS) implemented and used in the research method is parameterized; therefore, it is possible to compare the results. The results of clustering are presented in dendrograms in this paper. The received results of grouping and identification based on HACA are compared with other SEI methods in order to assess the degree of their usefulness and effectiveness for systems of ESM/ELINT class.

Download Full-text

Research on NMF Based Hierarchical Clustering Methods

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.1306 ◽

2010 ◽

Vol 439-440 ◽

pp. 1306-1311

Author(s):

Fang Li ◽

Qun Xiong Zhu

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Clustering Method ◽

Hierarchical Agglomerative Clustering ◽

Hierarchical Clustering Methods

LSI based hierarchical agglomerative clustering algorithm is studied. Aiming to the problems of LSI based hierarchical agglomerative clustering method, NMF based hierarchical clustering method is proposed and analyzed. Two ways of implementing NMF based method are introduced. Finally the result of two groups of experiment based on the TanCorp document corpora show that the method proposed is effective.

Download Full-text

A degree-distribution based hierarchical agglomerative clustering algorithm for protein complexes identification

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2011.07.005 ◽

2011 ◽

Vol 35 (5) ◽

pp. 298-307 ◽

Cited By ~ 2

Author(s):

Liang Yu ◽

Lin Gao ◽

Kui Li ◽

Yi Zhao ◽

David K.Y. Chiu

Keyword(s):

Degree Distribution ◽

Clustering Algorithm ◽

Protein Complexes ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering

Download Full-text

Design of an Unsupervised Machine Learning-Based Movie Recommender System

10.20944/preprints202001.0124.v1 ◽

2020 ◽

Author(s):

Debby Cintia Ganesha Putri ◽

Jenq-Shiou Leu ◽

Pavel Seda

Keyword(s):

Recommender System ◽

Clustering Algorithm ◽

System Development ◽

Clustering Algorithms ◽

Mean Shift ◽

Computational Time ◽

Agglomerative Clustering ◽

Method Performance ◽

Cluster Validity Indices ◽

Validity Indices

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and, Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used Average Similarity, Computational Time, Association Rule with Apriori algorithm, and Clustering Performance Evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies-Bouldin Index.

Download Full-text

Quality Assured Optimal Resource Provisioning and Scheduling Technique Based on Improved Hierarchical Agglomerative Clustering Algorithm (IHAC)

International Journal of Engineering and Technology ◽

10.21817/ijet/2016/v8i4/160804402 ◽

2016 ◽

Vol 8 (4) ◽

pp. 1627-1641

Author(s):

Meenakshi A. ◽

Sirmathi H. ◽

Anitha Ruth J.

Keyword(s):

Clustering Algorithm ◽

Resource Provisioning ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering ◽

Optimal Resource ◽

Scheduling Technique

Download Full-text

Analysis of Electric Energy Consumption Profiles Using a Machine Learning Approach: A Paraguayan Case Study

Electronics ◽

10.3390/electronics11020267 ◽

2022 ◽

Vol 11 (2) ◽

pp. 267

Author(s):

Félix Morales ◽

Miguel García-Torres ◽

Gustavo Velázquez ◽

Federico Daumas-Ladouce ◽

Pedro E. Gardel-Sotomayor ◽

...

Keyword(s):

Clustering Algorithms ◽

Electric Energy ◽

Real Data ◽

Original Data ◽

Data Sets ◽

Agglomerative Clustering ◽

Daily Consumption ◽

Load Curve ◽

Electric Energy Consumption ◽

Hierarchical Agglomerative Clustering

Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters.

Download Full-text