Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm

Author(s):  
Nádia Junqueira Martarelli ◽  
Marcelo Seido Nagano
Author(s):  
Shihua Liu ◽  
Hao Zhang ◽  
Xianghua Liu

A Two-stage clustering framework and a clustering algorithm for mixed attribute data based on density peaks and Goodall distance are proposed. Firstly, the subset of numerical attributes of the dataset is clustered, and then the result is mapped into one-dimensional categorical attribute and added to the subset of categorical attribute data. Finally, the new dataset is clustered by the density peaks clustering algorithm to obtain the final result. Experiments on three commonly used UCI datasets show that this algorithm can effectively realize mixed attribute clustering and produce better clustering results than the traditional K-prototypes algorithm do. The clustering accuracy on the Acute, Heart and Credit datasets are 17%, 24%, and 21% higher on average than that of the K-prototypes, respectively.


2018 ◽  
Vol 6 (2) ◽  
pp. 176-183
Author(s):  
Purnendu Das ◽  
◽  
Bishwa Ranjan Roy ◽  
Saptarshi Paul ◽  
◽  
...  

Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 891
Author(s):  
Aurea Grané ◽  
Alpha A. Sow-Barry

This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.


2014 ◽  
Vol 543-547 ◽  
pp. 1934-1938
Author(s):  
Ming Xiao

For a clustering algorithm in two-dimension spatial data, the Adaptive Resonance Theory exists not only the shortcomings of pattern drift and vector module of information missing, but also difficultly adapts to spatial data clustering which is irregular distribution. A Tree-ART2 network model was proposed based on the above situation. It retains the memory of old model which maintains the constraint of spatial distance by learning and adjusting LTM pattern and amplitude information of vector. Meanwhile, introducing tree structure to the model can reduce the subjective requirement of vigilance parameter and decrease the occurrence of pattern mixing. It is showed that TART2 network has higher plasticity and adaptability through compared experiments.


Energies ◽  
2018 ◽  
Vol 11 (9) ◽  
pp. 2344 ◽  
Author(s):  
Enwen Li ◽  
Linong Wang ◽  
Bin Song ◽  
Siliang Jian

Dissolved gas analysis (DGA) of the oil allows transformer fault diagnosis and status monitoring. Fuzzy c-means (FCM) clustering is an effective pattern recognition method, but exhibits poor clustering accuracy for dissolved gas data and usually fails to subsequently correctly classify transformer faults. The existing feasible approach involves combination of the FCM clustering algorithm with other intelligent algorithms, such as neural networks and support vector machines. This method enables good classification; however, the algorithm complexity is greatly increased. In this paper, the FCM clustering algorithm itself is improved and clustering analysis of DGA data is realized. First, the non-monotonicity of the traditional clustering membership function with respect to the sample distance and its several local extrema are discussed, which mainly explain the poor classification accuracy of DGA data clustering. Then, an exponential form of the membership function is proposed to obtain monotony with respect to distance, thereby improving the dissolved gas data clustering. Likewise, a similarity function to determine the degree of membership is derived. Test results for large datasets show that the improved clustering algorithm can be successfully applied for DGA-data-based transformer fault detection.


Sign in / Sign up

Export Citation Format

Share Document