HIERARCHICAL SPHERICAL CLUSTERING

Author(s):  
VICENÇ TORRA ◽  
SADAAKI MIYAMOTO

This work introduces an alternative representation for large dimensional data sets. Instead of using 2D or 3D representations, data is located on the surface of a sphere. Together with this representation, a hierarchical clustering algorithm is defined to analyse and extract the structure of the data. The algorithm builds a hierarchical structure (a dendrogram) in such a way that different cuts of the structure lead to different partitions of the surface of the sphere. This can be seen as a set of concentric spheres, each one being of different granularity. Also, to obtain an initial assignment of the data on the surface of the sphere, a method based on Sammon's mapping has been developed.

2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2018 ◽  
Vol 31 (11) ◽  
pp. 8051-8068 ◽  
Author(s):  
Dongdong Cheng ◽  
Qingsheng Zhu ◽  
Jinlong Huang ◽  
Quanwang Wu ◽  
Lijun Yang

2011 ◽  
Vol 08 (03) ◽  
pp. 597-609 ◽  
Author(s):  
Y. T. ZHOU ◽  
Z. H. HE ◽  
Z. G. WU

An adaptive parallel algorithm for hierarchical clustering based on PRAM model was presented. The following approaches were devised to produce the optimized clustered data set, including the data preprocessing based on "90-10" rule to decrease the size of the data set, progressively the parallel algorithm to create Euclid minimum spanning trees on absolute graph, and the algorithm that determined the split strategies and dealt with the memory conflicts. The data set was clustered based on the noncollision memory, the lowest cost, and weakest PRAM-EREW model. N data sets were clustered in O((λn)2/p) time (0.1 ≤ λ ≤ 0.3) by performing this algorithm using p processors (1 ≤ p ≤ n/ log (n)). The parallel hierarchical clustering algorithm based on PRAM model was adaptive, and of noncollision memory. The computing time could be significantly reduced after original inputting data was effectually preprocessed through the improved preprocessing methods presented in this paper.


Author(s):  
MICHEL BRUYNOOGHE

The clustering of large data sets is of great interest in fields such as pattern recognition, numerical taxonomy, image or speech processing. The traditional Ascendant Hierarchical Algorithm (AHC) cannot be run for sets of more than a few thousand elements. The reducible neighborhoods clustering algorithm, which is presented in this paper, has overtaken the limits of the traditional hierarchical clustering algorithm by generating an exact hierarchy on a large data set. The theoretical justification of this algorithm is the so-called Bruynooghe reducibility principle, that lays down the condition under which the exact hierarchy may be constructed locally, by carrying out aggregations in restricted regions of the representation space. As for the Day and Edelsbrunner algorithm, the maximum theoretical time complexity of the reducible neighborhoods clustering algorithm is O(n2 log n), regardless of the chosen clustering strategy. But the reducible neighborhoods clustering algorithm uses the original data table and its practical performances are by far better than Day and Edelsbrunner’s algorithm, thus allowing the hierarchical clustering of large data sets, i.e. composed of more than 10 000 objects.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


Sign in / Sign up

Export Citation Format

Share Document