Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm

A Two-stage clustering framework and a clustering algorithm for mixed attribute data based on density peaks and Goodall distance are proposed. Firstly, the subset of numerical attributes of the dataset is clustered, and then the result is mapped into one-dimensional categorical attribute and added to the subset of categorical attribute data. Finally, the new dataset is clustered by the density peaks clustering algorithm to obtain the final result. Experiments on three commonly used UCI datasets show that this algorithm can effectively realize mixed attribute clustering and produce better clustering results than the traditional K-prototypes algorithm do. The clustering accuracy on the Acute, Heart and Credit datasets are 17%, 24%, and 21% higher on average than that of the K-prototypes, respectively.

Download Full-text

Balanced Data Clustering Algorithm for Both Hard and Soft Clustering

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.176183 ◽

2018 ◽

Vol 6 (2) ◽

pp. 176-183

Author(s):

Purnendu Das ◽

◽

Bishwa Ranjan Roy ◽

Saptarshi Paul ◽

◽

...

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Soft Clustering

Download Full-text

Visualizing Profiles of Large Datasets of Weighted and Mixed Data

Mathematics ◽

10.3390/math9080891 ◽

2021 ◽

Vol 9 (8) ◽

pp. 891

Author(s):

Aurea Grané ◽

Alpha A. Sow-Barry

Keyword(s):

Multidimensional Scaling ◽

Random Sample ◽

Simulation Study ◽

Clustering Algorithm ◽

Computational Cost ◽

Interpolation Formula ◽

Large Datasets ◽

Mixed Data ◽

Multivariate Techniques ◽

High Computational Cost

This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.

Download Full-text

Variable Selection for Mixed Data Clustering: Application in Human Population Genomics

Journal of Classification ◽

10.1007/s00357-018-9301-y ◽

2019 ◽

Vol 37 (1) ◽

pp. 124-142 ◽

Cited By ~ 1

Author(s):

Matthieu Marbac ◽

Mohammed Sedki ◽

Tienne Patin

Keyword(s):

Variable Selection ◽

Data Clustering ◽

Human Population ◽

Population Genomics ◽

Mixed Data ◽

Selection For

Download Full-text

Tree-ART2 Learning Model for Spatial Clustering in Second Dimension

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.1934 ◽

2014 ◽

Vol 543-547 ◽

pp. 1934-1938

Author(s):

Ming Xiao

Keyword(s):

Network Model ◽

Spatial Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Adaptive Resonance Theory ◽

Spatial Distance ◽

Resonance Theory ◽

Adaptive Resonance ◽

Vector Module

For a clustering algorithm in two-dimension spatial data, the Adaptive Resonance Theory exists not only the shortcomings of pattern drift and vector module of information missing, but also difficultly adapts to spatial data clustering which is irregular distribution. A Tree-ART2 network model was proposed based on the above situation. It retains the memory of old model which maintains the constraint of spatial distance by learning and adjusting LTM pattern and amplitude information of vector. Meanwhile, introducing tree structure to the model can reduce the subjective requirement of vigilance parameter and decrease the occurrence of pattern mixing. It is showed that TART2 network has higher plasticity and adaptability through compared experiments.

Download Full-text

Improved Fuzzy C-Means Clustering for Transformer Fault Diagnosis Using Dissolved Gas Analysis Data

Energies ◽

10.3390/en11092344 ◽

2018 ◽

Vol 11 (9) ◽

pp. 2344 ◽

Cited By ~ 6

Author(s):

Enwen Li ◽

Linong Wang ◽

Bin Song ◽

Siliang Jian

Keyword(s):

Fault Diagnosis ◽

Membership Function ◽

Data Clustering ◽

Clustering Algorithm ◽

Gas Analysis ◽

Dissolved Gas ◽

Fuzzy C Means ◽

Dissolved Gas Analysis ◽

Fcm Clustering ◽

Transformer Fault

Dissolved gas analysis (DGA) of the oil allows transformer fault diagnosis and status monitoring. Fuzzy c-means (FCM) clustering is an effective pattern recognition method, but exhibits poor clustering accuracy for dissolved gas data and usually fails to subsequently correctly classify transformer faults. The existing feasible approach involves combination of the FCM clustering algorithm with other intelligent algorithms, such as neural networks and support vector machines. This method enables good classification; however, the algorithm complexity is greatly increased. In this paper, the FCM clustering algorithm itself is improved and clustering analysis of DGA data is realized. First, the non-monotonicity of the traditional clustering membership function with respect to the sample distance and its several local extrema are discussed, which mainly explain the poor classification accuracy of DGA data clustering. Then, an exponential form of the membership function is proposed to obtain monotony with respect to distance, thereby improving the dissolved gas data clustering. Likewise, a similarity function to determine the degree of membership is derived. Test results for large datasets show that the improved clustering algorithm can be successfully applied for DGA-data-based transformer fault detection.

Download Full-text

Genetic Algorithm Based Parallel K-Means Data Clustering Algorithm Using MapReduce Programming Paradigm on Hadoop Environment (GAPKCA)

Advances in Intelligent Systems and Computing - Recent Advances on Soft Computing and Data Mining ◽

10.1007/978-3-030-36056-6_10 ◽

2019 ◽

pp. 98-108 ◽

Cited By ~ 1

Author(s):

Sayer Alshammari ◽

Maslina Binti Zolkepli ◽

Rusli Bin Abdullah

Keyword(s):

Genetic Algorithm ◽

Data Clustering ◽

Clustering Algorithm ◽

Programming Paradigm

Download Full-text