Clustering Methods for Spherical Data: An Overview and a New Generalization

Author(s):  
Sungsu Kim ◽  
Ashis SenGupta
Author(s):  
Yuchi Kanzawa ◽  

In this paper, a power-regularization-based fuzzy clustering method is proposed for spherical data. Power regularization has not been previously applied to fuzzy clustering for spherical data. The proposed method is transformed to the conventional fuzzy clustering method, entropy-regularized fuzzy clustering for spherical data (eFCS), for a specified fuzzification parameter value. Numerical experiments on two artificial datasets reveal the properties of the proposed method. Furthermore, numerical experiments on four real datasets indicate that this method is more accurate than the conventional fuzzy clustering methods: standard fuzzy clustering for spherical data (sFCS) and eFCS.


2018 ◽  
Vol 18 (13) ◽  
pp. 1110-1122 ◽  
Author(s):  
Juan F. Morales ◽  
Lucas N. Alberca ◽  
Sara Chuguransky ◽  
Mauricio E. Di Ianni ◽  
Alan Talevi ◽  
...  

Much interest has been paid in the last decade on molecular predictors of promiscuity, including molecular weight, log P, molecular complexity, acidity constant and molecular topology, with correlations between promiscuity and those descriptors seemingly being context-dependent. It has been observed that certain therapeutic categories (e.g. mood disorders therapies) display a tendency to include multi-target agents (i.e. selective non-selectivity). Numerous QSAR models based on topological descriptors suggest that the topology of a given drug could be used to infer its therapeutic applications. Here, we have used descriptive statistics to explore the distribution of molecular topology descriptors and other promiscuity predictors across different therapeutic categories. Working with the publicly available ChEMBL database and 14 molecular descriptors, both hierarchical and non-hierchical clustering methods were applied to the descriptors mean values of the therapeutic categories after the refinement of the database (770 drugs grouped into 34 therapeutic categories). On the other hand, another publicly available database (repoDB) was used to retrieve cases of clinically-approved drug repositioning examples that could be classified into the therapeutic categories considered by the aforementioned clusters (111 cases), and the correspondence between the two studies was evaluated. Interestingly, a 3- cluster hierarchical clustering scheme based on only 14 molecular descriptors linked to promiscuity seem to explain up to 82.9% of approved cases of drug repurposing retrieved of repoDB. Therapeutic categories seem to display distinctive molecular patterns, which could be used as a basis for drug screening and drug design campaigns, and to unveil drug repurposing opportunities between particular therapeutic categories.


2021 ◽  
Vol 13 (11) ◽  
pp. 2125
Author(s):  
Bardia Yousefi ◽  
Clemente Ibarra-Castanedo ◽  
Martin Chamberland ◽  
Xavier P. V. Maldague ◽  
Georges Beaudoin

Clustering methods unequivocally show considerable influence on many recent algorithms and play an important role in hyperspectral data analysis. Here, we challenge the clustering for mineral identification using two different strategies in hyperspectral long wave infrared (LWIR, 7.7–11.8 μm). For that, we compare two algorithms to perform the mineral identification in a unique dataset. The first algorithm uses spectral comparison techniques for all the pixel-spectra and creates RGB false color composites (FCC). Then, a color based clustering is used to group the regions (called FCC-clustering). The second algorithm clusters all the pixel-spectra to directly group the spectra. Then, the first rank of non-negative matrix factorization (NMF) extracts the representative of each cluster and compares results with the spectral library of JPL/NASA. These techniques give the comparison values as features which convert into RGB-FCC as the results (called clustering rank1-NMF). We applied K-means as clustering approach, which can be modified in any other similar clustering approach. The results of the clustering-rank1-NMF algorithm indicate significant computational efficiency (more than 20 times faster than the previous approach) and promising performance for mineral identification having up to 75.8% and 84.8% average accuracies for FCC-clustering and clustering-rank1 NMF algorithms (using spectral angle mapper (SAM)), respectively. Furthermore, several spectral comparison techniques are used also such as adaptive matched subspace detector (AMSD), orthogonal subspace projection (OSP) algorithm, principal component analysis (PCA), local matched filter (PLMF), SAM, and normalized cross correlation (NCC) for both algorithms and most of them show a similar range in accuracy. However, SAM and NCC are preferred due to their computational simplicity. Our algorithms strive to identify eleven different mineral grains (biotite, diopside, epidote, goethite, kyanite, scheelite, smithsonite, tourmaline, pyrope, olivine, and quartz).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gregoire Preud’homme ◽  
Kevin Duarte ◽  
Kevin Dalleau ◽  
Claire Lacomblez ◽  
Emmanuel Bresso ◽  
...  

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.


Author(s):  
Himanshu Mittal ◽  
Avinash Chandra Pandey ◽  
Mukesh Saraswat ◽  
Sumit Kumar ◽  
Raju Pal ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document