Multi-objective analysis of clusters using gravitational density based model with PSO

Author(s):  
Naveen Trivedi ◽  
Suvendu Kanungo

Background: Clustering analysis plays a vital role in obtaining knowledgeable data from the huge amount of data sets in knowledge discovery. Most of the traditional clustering algorithms do not work well with high dimensional data. The objective of effective clustering is to obtain well connected, compact and separated clusters. Density-based clustering (DBSCAN) is one of the popular clustering algorithms uses local density information of data points to detect clusters with arbitrary shapes. The Gravitational search algorithm (GSA) is one of the effective approaches inspired by Newton’s law of gravitational force where every particle in the universe attracts every other particle with force. Objectives: In this paper, a novel multi-objective clustering is proposed to produce the desired number of valid clusters, further in a part of the paper we have also optimized the algorithm to obtain optimal solutions. Method: In the proposed approach a hybrid clustering algorithm based on GSA along with DBSCAN is recommended to group the data into the desired number of clusters, and in the next phase of the algorithm PSO is applied in order to optimize the solutions using the fitness functions. Results: In the analysis of the result, we used two objectives function namely quantization error and inter-cluster distance to evaluate the performance of our algorithm. Conclusion: The algorithm has been compared with some well- known traditional heuristics based method in terms of accuracy and computational time.

Author(s):  
UREERAT WATTANACHON ◽  
CHIDCHANOK LURSINSAP

Existing clustering algorithms, such as single-link clustering, k-means, CURE, and CSM are designed to find clusters based on predefined parameters specified by users. These algorithms may be unsuccessful if the choice of parameters is inappropriate with respect to the data set being clustered. Most of these algorithms work very well for compact and hyper-spherical clusters. In this paper, a new hybrid clustering algorithm called Self-Partition and Self-Merging (SPSM) is proposed. The SPSM algorithm partitions the input data set into several subclusters in the first phase and, then, removes the noisy data in the second phase. In the third phase, the normal subclusters are continuously merged to form the larger clusters based on the inter-cluster distance and intra-cluster distance criteria. From the experimental results, the SPSM algorithm is very efficient to handle the noisy data set, and to cluster the data sets of arbitrary shapes of different density. Several examples for color image show the versatility of the proposed method and compare with results described in the literature for the same images. The computational complexity of the SPSM algorithm is O(N2), where N is the number of data points.


2013 ◽  
Vol 3 (4) ◽  
pp. 1-14 ◽  
Author(s):  
S. Sampath ◽  
B. Ramya

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


2021 ◽  
Vol 12 (4) ◽  
pp. 169-185
Author(s):  
Saida Ishak Boushaki ◽  
Omar Bendjeghaba ◽  
Nadjet Kamel

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.


2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2016 ◽  
Vol 69 (5) ◽  
pp. 1143-1153 ◽  
Author(s):  
Marta Wlodarczyk–Sielicka ◽  
Andrzej Stateczny

An electronic navigational chart is a major source of information for the navigator. The component that contributes most significantly to the safety of navigation on water is the information on the depth of an area. For the purposes of this article, the authors use data obtained by the interferometric sonar GeoSwath Plus. The data were collected in the area of the Port of Szczecin. The samples constitute large sets of data. Data reduction is a procedure to reduce the size of a data set to make it easier and more effective to analyse. The main objective of the authors is the compilation of a new reduction algorithm for bathymetric data. The clustering of data is the first part of the search algorithm. The next step consists of generalisation of bathymetric data. This article presents a comparison and analysis of results of clustering bathymetric data using the following selected methods:K-means clustering algorithm, traditional hierarchical clustering algorithms and self-organising map (using artificial neural networks).


Sign in / Sign up

Export Citation Format

Share Document