A Comprehensive Analysis of Quantum Clustering : Finding All the Potential Minima

Aude Maignan; Tony Scott

doi:10.5121/ijdkp.2021.11103

A Comprehensive Analysis of Quantum Clustering : Finding All the Potential Minima

International Journal of Data Mining & Knowledge Management Process ◽

10.5121/ijdkp.2021.11103 ◽

2021 ◽

Vol 11 (1) ◽

pp. 33-54

Author(s):

Aude Maignan ◽

Tony Scott

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Numerical Approach ◽

Quantum Potential ◽

Mathematical Task ◽

Solid State Physics ◽

Exponential Polynomials ◽

Clustering Problem ◽

Quantum Clustering ◽

Number Of Particles

Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a σ value, a hyper-parameter which can be manually defined and manipulated to suit the application. Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an outstanding task because normally such expressions are impossible to solve analytically. However, we prove that if the points are all included in a square region of size σ, there is only one minimum. This bound is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new numerical approach “per block”. This technique decreases the number of particles by approximating some groups of particles to weighted particles. These findings are not only useful to the quantum clustering problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics and other applications.

Download Full-text

Quantum Clustering Analysis: Minima of the Potential Energy Function

10.5121/csit.2020.101914 ◽

2020 ◽

Author(s):

Aude Maignan ◽

Tony Scott

Keyword(s):

Clustering Analysis ◽

Clustering Algorithm ◽

Potential Energy Function ◽

Numerical Approach ◽

Quantum Potential ◽

Solid State Physics ◽

Exponential Polynomials ◽

Clustering Problem ◽

Quantum Clustering ◽

Number Of Particles

Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a 𝜎 value, a hyper-parameter which can be manually defined and manipulated to suit the application. Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an outstanding task because normally such expressions are impossible to solve analytically. However, we prove that if the points are all included in a square region of size 𝜎, there is only one minimum. This bound is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new numerical approach “per block”. This technique decreases the number of particles (or samples) by approximating some groups of particles to weighted particles. These findings are not only useful to the quantum clustering problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics and other applications.

Download Full-text

Big Data Clustering with Kernel k-Means: Resources, Time and Performance

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018600060 ◽

2018 ◽

Vol 27 (04) ◽

pp. 1860006

Author(s):

Nikolaos Tsapanos ◽

Anastasios Tefas ◽

Nikolaos Nikolaidis ◽

Ioannis Pitas

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Learning Task ◽

Related Data ◽

Clustering Problem ◽

Processing Power ◽

Trade Offs ◽

Separable Kernel ◽

And Performance

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.

Download Full-text

Universality of Logarithmic Loss in Fixed-Length Lossy Compression

Entropy ◽

10.3390/e21060580 ◽

2019 ◽

Vol 21 (6) ◽

pp. 580

Author(s):

Albert No

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Original Problem ◽

Lossy Compression ◽

Finite Alphabet ◽

Clustering Problem ◽

Fixed Length ◽

Logarithmic Loss ◽

Categorical Data Clustering

We established a universality of logarithmic loss over a finite alphabet as a distortion criterion in fixed-length lossy compression. For any fixed-length lossy-compression problem under an arbitrary distortion criterion, we show that there is an equivalent lossy-compression problem under logarithmic loss. The equivalence is in the strong sense that we show that finding good schemes in corresponding lossy compression under logarithmic loss is essentially equivalent to finding good schemes in the original problem. This equivalence relation also provides an algebraic structure in the reconstruction alphabet, which allows us to use known techniques in the clustering literature. Furthermore, our result naturally suggests a new clustering algorithm in the categorical data-clustering problem.

Download Full-text

Fuzzy Rules for Ant Based Clustering Algorithm

Advances in Fuzzy Systems ◽

10.1155/2016/8198915 ◽

2016 ◽

Vol 2016 ◽

pp. 1-16 ◽

Cited By ~ 2

Author(s):

Amira Hamdi ◽

Nicolas Monmarché ◽

Mohamed Slimane ◽

Adel M. Alimi

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Shortest Paths ◽

Second Step ◽

Ant System ◽

Graph Data ◽

Clustering Problem ◽

Intelligent Technique ◽

Artificial Ants ◽

Fcm Clustering

This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS) algorithm with the fuzzyc-means (FCM) clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzyc-means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those ofK-means,K-medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.

Download Full-text

Balanced Data Clustering Algorithm for Both Hard and Soft Clustering

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.176183 ◽

2018 ◽

Vol 6 (2) ◽

pp. 176-183

Author(s):

Purnendu Das ◽

◽

Bishwa Ranjan Roy ◽

Saptarshi Paul ◽

◽

...

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Soft Clustering

Download Full-text

CLUSTERING USING AN IMPROVED HYBRID GENETIC ALGORITHM

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300700362x ◽

2007 ◽

Vol 16 (06) ◽

pp. 919-934

Author(s):

YONGGUO LIU ◽

XIAORONG PU ◽

YIDONG SHEN ◽

ZHANG YI ◽

XIAOFENG LIAO

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Hybrid Genetic Algorithm ◽

Sum Of Squares ◽

Clustering Methods ◽

Clustering Problem ◽

Mutation Operation ◽

Iteration Methods ◽

Genetic Clustering ◽

The Individual

In this article, a new genetic clustering algorithm called the Improved Hybrid Genetic Clustering Algorithm (IHGCA) is proposed to deal with the clustering problem under the criterion of minimum sum of squares clustering. In IHGCA, the improvement operation including five local iteration methods is developed to tune the individual and accelerate the convergence speed of the clustering algorithm, and the partition-absorption mutation operation is designed to reassign objects among different clusters. By experimental simulations, its superiority over some known genetic clustering methods is demonstrated.

Download Full-text

Tree-ART2 Learning Model for Spatial Clustering in Second Dimension

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.1934 ◽

2014 ◽

Vol 543-547 ◽

pp. 1934-1938

Author(s):

Ming Xiao

Keyword(s):

Network Model ◽

Spatial Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Adaptive Resonance Theory ◽

Spatial Distance ◽

Resonance Theory ◽

Adaptive Resonance ◽

Vector Module

For a clustering algorithm in two-dimension spatial data, the Adaptive Resonance Theory exists not only the shortcomings of pattern drift and vector module of information missing, but also difficultly adapts to spatial data clustering which is irregular distribution. A Tree-ART2 network model was proposed based on the above situation. It retains the memory of old model which maintains the constraint of spatial distance by learning and adjusting LTM pattern and amplitude information of vector. Meanwhile, introducing tree structure to the model can reduce the subjective requirement of vigilance parameter and decrease the occurrence of pattern mixing. It is showed that TART2 network has higher plasticity and adaptability through compared experiments.

Download Full-text

Improved Fuzzy C-Means Clustering for Transformer Fault Diagnosis Using Dissolved Gas Analysis Data

Energies ◽

10.3390/en11092344 ◽

2018 ◽

Vol 11 (9) ◽

pp. 2344 ◽

Cited By ~ 6

Author(s):

Enwen Li ◽

Linong Wang ◽

Bin Song ◽

Siliang Jian

Keyword(s):

Fault Diagnosis ◽

Membership Function ◽

Data Clustering ◽

Clustering Algorithm ◽

Gas Analysis ◽

Dissolved Gas ◽

Fuzzy C Means ◽

Dissolved Gas Analysis ◽

Fcm Clustering ◽

Transformer Fault

Dissolved gas analysis (DGA) of the oil allows transformer fault diagnosis and status monitoring. Fuzzy c-means (FCM) clustering is an effective pattern recognition method, but exhibits poor clustering accuracy for dissolved gas data and usually fails to subsequently correctly classify transformer faults. The existing feasible approach involves combination of the FCM clustering algorithm with other intelligent algorithms, such as neural networks and support vector machines. This method enables good classification; however, the algorithm complexity is greatly increased. In this paper, the FCM clustering algorithm itself is improved and clustering analysis of DGA data is realized. First, the non-monotonicity of the traditional clustering membership function with respect to the sample distance and its several local extrema are discussed, which mainly explain the poor classification accuracy of DGA data clustering. Then, an exponential form of the membership function is proposed to obtain monotony with respect to distance, thereby improving the dissolved gas data clustering. Likewise, a similarity function to determine the degree of membership is derived. Test results for large datasets show that the improved clustering algorithm can be successfully applied for DGA-data-based transformer fault detection.

Download Full-text