Quantum Clustering Analysis: Minima of the Potential Energy Function

Mapping Intimacies ◽

10.5121/csit.2020.101914 ◽

2020 ◽

Author(s):

Aude Maignan ◽

Tony Scott

Keyword(s):

Clustering Analysis ◽

Clustering Algorithm ◽

Potential Energy Function ◽

Numerical Approach ◽

Quantum Potential ◽

Solid State Physics ◽

Exponential Polynomials ◽

Clustering Problem ◽

Quantum Clustering ◽

Number Of Particles

Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a 𝜎 value, a hyper-parameter which can be manually defined and manipulated to suit the application. Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an outstanding task because normally such expressions are impossible to solve analytically. However, we prove that if the points are all included in a square region of size 𝜎, there is only one minimum. This bound is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new numerical approach “per block”. This technique decreases the number of particles (or samples) by approximating some groups of particles to weighted particles. These findings are not only useful to the quantum clustering problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics and other applications.

Download Full-text

A Comprehensive Analysis of Quantum Clustering : Finding All the Potential Minima

International Journal of Data Mining & Knowledge Management Process ◽

10.5121/ijdkp.2021.11103 ◽

2021 ◽

Vol 11 (1) ◽

pp. 33-54

Author(s):

Aude Maignan ◽

Tony Scott

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Numerical Approach ◽

Quantum Potential ◽

Mathematical Task ◽

Solid State Physics ◽

Exponential Polynomials ◽

Clustering Problem ◽

Quantum Clustering ◽

Number Of Particles

Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a σ value, a hyper-parameter which can be manually defined and manipulated to suit the application. Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an outstanding task because normally such expressions are impossible to solve analytically. However, we prove that if the points are all included in a square region of size σ, there is only one minimum. This bound is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new numerical approach “per block”. This technique decreases the number of particles by approximating some groups of particles to weighted particles. These findings are not only useful to the quantum clustering problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics and other applications.

Download Full-text

Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data

Applied Sciences ◽

10.3390/app11135999 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5999

Author(s):

Diego A. Camacho-Hernández ◽

Victor E. Nieto-Caballero ◽

José E. León-Burguete ◽

Julio A. Freyre-González

Keyword(s):

Systems Biology ◽

Clustering Analysis ◽

Quantitative Assessment ◽

Clustering Algorithm ◽

Statistical Evaluation ◽

Quantitative Methodology ◽

Typical Problem ◽

Statistical Validation ◽

Common Features ◽

Biology Research

Identifying groups that share common features among datasets through clustering analysis is a typical problem in many fields of science, particularly in post-omics and systems biology research. In respect of this, quantifying how a measure can cluster or organize intrinsic groups is important since currently there is no statistical evaluation of how ordered is, or how much noise is embedded in the resulting clustered vector. Much of the literature focuses on how well the clustering algorithm orders the data, with several measures regarding external and internal statistical validation; but no score has been developed to quantify statistically the noise in an arranged vector posterior to a clustering algorithm, i.e., how much of the clustering is due to randomness. Here, we present a quantitative methodology, based on autocorrelation, in order to assess this problem.

Download Full-text

CLUSTERING USING AN IMPROVED HYBRID GENETIC ALGORITHM

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300700362x ◽

2007 ◽

Vol 16 (06) ◽

pp. 919-934

Author(s):

YONGGUO LIU ◽

XIAORONG PU ◽

YIDONG SHEN ◽

ZHANG YI ◽

XIAOFENG LIAO

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Hybrid Genetic Algorithm ◽

Sum Of Squares ◽

Clustering Methods ◽

Clustering Problem ◽

Mutation Operation ◽

Iteration Methods ◽

Genetic Clustering ◽

The Individual

In this article, a new genetic clustering algorithm called the Improved Hybrid Genetic Clustering Algorithm (IHGCA) is proposed to deal with the clustering problem under the criterion of minimum sum of squares clustering. In IHGCA, the improvement operation including five local iteration methods is developed to tune the individual and accelerate the convergence speed of the clustering algorithm, and the partition-absorption mutation operation is designed to reassign objects among different clusters. By experimental simulations, its superiority over some known genetic clustering methods is demonstrated.

Download Full-text

K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data

Mathematical Problems in Engineering ◽

10.1155/2015/535932 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Cheng Lu ◽

Shiji Song ◽

Cheng Wu

Keyword(s):

Clustering Analysis ◽

Incomplete Data ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Interval Data ◽

Similarity Function ◽

K Nearest Neighbor ◽

Partial Data ◽

Missing Attributes ◽

Ap Clustering

The Affinity Propagation (AP) algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based onK-nearest neighbor intervals (KNNI) for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results.

Download Full-text

Clustering Algorithm for Vehicle’s Driving Data Feature based on Integrated Navigation

International Journal of Vehicle Structures and Systems ◽

10.4273/ijvss.13.4.14 ◽

2021 ◽

Vol 13 (4) ◽

Author(s):

Na Guo ◽

Yiyi Zhu

Keyword(s):

Clustering Algorithm ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

Integrated Navigation ◽

Clustering Problem ◽

Incremental Method ◽

Feature Parameters ◽

Vehicle Acceleration ◽

Feature Based ◽

Clustering Problems

The clustering result of K-means clustering algorithm is affected by the initial clustering center and the clustering result is not always global optimal. Therefore, the clustering analysis of vehicle’s driving data feature based on integrated navigation is carried out based on global K-means clustering algorithm. The vehicle mathematical model based on GPS/DR integrated navigation is constructed and the vehicle’s driving data based on GPS/DR integrated navigation, such as vehicle acceleration, are collected. After extracting the vehicle’s driving data features, the feature parameters of vehicle’s driving data are dimensionally reduced based on kernel principal component analysis to reduce the redundancy of feature parameters. The global K-means clustering algorithm converts clustering problem into a series of sub-cluster clustering problems. At the end of each iteration, an incremental method is used to select the next cluster of optimal initial centers. After determining the optimal clustering number, the feature clustering of vehicle’s driving data is completed. The experimental results show that the global K-means clustering algorithm has a clustering error of only 1.37% for vehicle’s driving data features and achieves high precision clustering for vehicle’s driving data features.

Download Full-text

MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data

Intelligent Data Analysis ◽

10.3233/ida-205340 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1507-1524

Author(s):

Chunying Zhang ◽

Ruiyan Gao ◽

Jiahao Wang ◽

Song Chen ◽

Fengchun Liu ◽

...

Keyword(s):

Measurement Method ◽

Clustering Algorithm ◽

Average Distance ◽

Boundary Region ◽

Data Sets ◽

Calculation Formula ◽

Information Granule ◽

Clustering Problem ◽

Definition Of ◽

Multiple Clusters

In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.

Download Full-text

A Data Distribution View of Clustering Algorithms

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch059 ◽

2011 ◽

pp. 374-381 ◽

Cited By ~ 1

Author(s):

Junjie Wu ◽

Jian Chen ◽

Hui Xiong

Keyword(s):

Data Mining ◽

Cluster Analysis ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Distribution ◽

Point Of View ◽

Group Method ◽

Data Sets ◽

Distribution Point

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.

Download Full-text

An Improved Routing Schema with Special Clustering Using PSO Algorithm for Heterogeneous Wireless Sensor Network

Sensors ◽

10.3390/s19030671 ◽

2019 ◽

Vol 19 (3) ◽

pp. 671 ◽

Cited By ~ 66

Author(s):

Jin Wang ◽

Yu Gao ◽

Wei Liu ◽

Arun Sangaiah ◽

Hye-Jin Kim

Keyword(s):

Energy Efficiency ◽

Clustering Algorithm ◽

Pso Algorithm ◽

Energy Utilization ◽

Wireless Sensor ◽

Energy Balancing ◽

Research Issues ◽

Protection Mechanism ◽

Clustering Problem ◽

Heterogeneous Wireless Sensor Network

Energy efficiency and energy balancing are crucial research issues as per routing protocol designing for self-organized wireless sensor networks (WSNs). Many literatures used the clustering algorithm to achieve energy efficiency and energy balancing, however, there are usually energy holes near the cluster heads (CHs) because of the heavy burden of forwarding. As the clustering problem in lossy WSNs is proved to be a NP-hard problem, many metaheuristic algorithms are utilized to solve the problem. In this paper, a special clustering method called Energy Centers Searching using Particle Swarm Optimization (EC-PSO) is presented to avoid these energy holes and search energy centers for CHs selection. During the first period, the CHs are elected using geometric method. After the energy of the network is heterogeneous, EC-PSO is adopted for clustering. Energy centers are searched using an improved PSO algorithm and nodes close to the energy center are elected as CHs. Additionally, a protection mechanism is also used to prevent low energy nodes from being the forwarder and a mobile data collector is introduced to gather the data. We conduct numerous simulations to illustrate that our presented EC-PSO outperforms than some similar works in terms of network lifetime enhancement and energy utilization ratio.

Download Full-text

Memetic Variable Clustering and Its Application

Mathematical Problems in Engineering ◽

10.1155/2019/4195318 ◽

2019 ◽

Vol 2019 ◽

pp. 1-15

Author(s):

JiaCheng Ni ◽

Li Li

Keyword(s):

Particle Swarm Optimization ◽

Local Search ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Particle Swarm ◽

Optimization Techniques ◽

Experimental Result ◽

Swarm Optimization ◽

Metaheuristic Optimization ◽

Variable Clustering

Clustering analysis is an important and difficult task in data mining and big data analysis. Although being a widely used clustering analysis technique, variable clustering did not get enough attention in previous studies. Inspired by the metaheuristic optimization techniques developed for clustering data items, we try to overcome the main shortcoming of k-means-based variable clustering algorithm, which is being sensitive to initial centroids by introducing the metaheuristic optimization. A novel memetic algorithm named MCLPSO (Memetic Comprehensive Learning Particle Swarm Optimization) based on CLPSO (Comprehensive Learning Particle Swarm Optimization) has been studied under the framework of memetic computing in our previous work. In this work, MCLPSO is used as a metaheuristic approach to improve the k-means-based variable clustering algorithm by adjusting the initial centroids iteratively to maximize the homogeneity of the clustering results. In MCLPSO, a chaotic local search operator is used and a simulated annealing- (SA-) based local search strategy is developed by combining the cognition-only PSO model with SA. The adaptive memetic strategy can enable the stagnant particles which cannot be improved by the comprehensive learning strategy to escape from the local optima and enable some elite particles to give fine-grained local search around the promising regions. The experimental result demonstrates a good performance of MCLPSO in optimizing the variable clustering criterion on several datasets compared with the original variable clustering method. Finally, for practical use, we also developed a web-based interactive software platform for the proposed approach and give a practical case study—analyzing the performance of semiconductor manufacturing system to demonstrate the usage.

Download Full-text

Emulation of high-performance correlation-based quantum clustering algorithm for two-dimensional data on FPGA

Quantum Information Processing ◽

10.1007/s11128-020-02683-9 ◽

2020 ◽

Vol 19 (6) ◽

Author(s):

Talal Bonny ◽

A. Haq

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Two Dimensional ◽

Quantum Clustering

Download Full-text