The Discretization of Continuous Attributes Based on Improved SOM Clustering

2014 ◽  
Vol 701-702 ◽  
pp. 88-93 ◽  
Author(s):  
Gang Tao ◽  
Yong Gang Yan ◽  
Jiao Zou ◽  
Jun Liu

In order to solve the problem of continuous attribute discretization, a new improved SOM clustering algorithm was proposed. The algorithm uses the SOM to achieve the initial cluster and get the clustering up limit, then treats the initial cluster centers as samples and use the BIRCH hierarchical clustering algorithm to get secondary clustering, then solves the problems of inflated clusters and identifies discrete breakpoints set. Finally, find the nearest neighbors of each cluster center among these any samples of Breakpoints sets which belong to its attribute, and use it as a basis of discrete trimming. The experimental results show that the proposed algorithm outperforms the conventional discrete SOM clustering algorithm in the breakpoints set (contour factor to enhance 75%) and discrete accuracy (incompatible degrees closer to 0) aspects.

2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Ziqi Jia ◽  
Ling Song

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.


2020 ◽  
Vol 6 (4) ◽  
pp. 431-443
Author(s):  
Xiaolong Yang ◽  
Xiaohong Jia

AbstractWe present a simple yet efficient algorithm for recognizing simple quadric primitives (plane, sphere, cylinder, cone) from triangular meshes. Our approach is an improved version of a previous hierarchical clustering algorithm, which performs pairwise clustering of triangle patches from bottom to top. The key contributions of our approach include a strategy for priority and fidelity consideration of the detected primitives, and a scheme for boundary smoothness between adjacent clusters. Experimental results demonstrate that the proposed method produces qualitatively and quantitatively better results than representative state-of-the-art methods on a wide range of test data.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Yaohui Liu ◽  
Dong Liu ◽  
Fang Yu ◽  
Zhengming Ma

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Ross Mawhorter ◽  
Ran Libeskind-Hadas

Abstract Background Maximum parsimony reconciliation in the duplication-transfer-loss model is a widely-used method for analyzing the evolutionary histories of pairs of entities such as hosts and parasites, symbiont species, and species and genes. While efficient algorithms are known for finding maximum parsimony reconciliations, the number of such reconciliations can be exponential in the size of the trees. Since these reconciliations can differ substantially from one another, making inferences from any one reconciliation may lead to conclusions that are not supported, or may even be contradicted, by other maximum parsimony reconciliations. Therefore, there is a need to find small sets of best representative reconciliations when the space of solutions is large and diverse. Results We provide a general framework for hierarchical clustering the space of maximum parsimony reconciliations. We demonstrate this framework for two specific linkage criteria, one that seeks to maximize the average support of the events found in the reconciliations in each cluster and the other that seeks to minimize the distance between reconciliations in each cluster. We analyze the asymptotic worst-case running times and provide experimental results that demonstrate the viability and utility of this approach. Conclusions The hierarchical clustering algorithm method proposed here provides a new approach to find a set of representative reconciliations in the potentially vast and diverse space of maximum parsimony reconciliations.


2011 ◽  
Vol 403-408 ◽  
pp. 1977-1980
Author(s):  
Yin Sheng Zhang ◽  
Hui Lin Shan ◽  
Jia Qiang Li ◽  
Jie Zhou

The traditional K-means clustering algorithm prematurely plunges into a local optimum because of sensitive selection of the initial cluster center. Hierarchical clustering algorithm can be used to generate the initial cluster center of K-means clustering algorithm. The geometric features of input data can achieve a good distribution by means of pretreatment and feature extraction and selection. In the learning of fuzzy neural network, Java language is used to write source code of the algorithm. The experimental results show that new algorithm has improved the clustering quality effectively.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


Sign in / Sign up

Export Citation Format

Share Document