scholarly journals An Information Theory based Approach to Multisource Clustering

Author(s):  
Pierre-Alexandre Murena ◽  
Jérémie Sublime ◽  
Basarab Matei ◽  
Antoine Cornuéjols

Clustering is a compression task which consists in grouping similar objects into clusters. In real-life applications, the system may have access to several views of the same data and each view may be processed by a specific clustering algorithm: this framework is called multi-view clustering and can benefit from algorithms capable of exchanging information between the different views. In this paper, we consider this type of unsupervised ensemble learning as a compression problem and develop a theoretical framework based on algorithmic theory of information suitable for multi-view clustering and collaborative clustering applications. Using this approach, we propose a new algorithm based on solid theoretical basis, and test it on several real and artificial data sets.

Author(s):  
SANGHAMITRA BANDYOPADHYAY ◽  
UJJWAL MAULIK ◽  
MALAY KUMAR PAKHIRA

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.


2013 ◽  
Vol 411-414 ◽  
pp. 1884-1893
Author(s):  
Yong Chun Cao ◽  
Ya Bin Shao ◽  
Shuang Liang Tian ◽  
Zheng Qi Cai

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.


2013 ◽  
Vol 3 (4) ◽  
pp. 1-14 ◽  
Author(s):  
S. Sampath ◽  
B. Ramya

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.


2012 ◽  
Vol 58 (4) ◽  
pp. 323-326
Author(s):  
Krzysztof Hryniów ◽  
Andrzej Dzieliński

Abstract Sequential pattern mining is an extensively studied method for data mining. One of new and less documented approaches is estimation of statistical characteristics of sequence for creating model sequences, that can be used to speed up the process of sequence mining. This paper proposes extensive modifications to one of such algorithms, ProMFS (probabilistic algorithm for mining frequent sequences), which notably increases algorithm’s processing speed by a significant reduction of its computational complexity. A new version of algorithm is evaluated for real-life and artificial data sets and proven to be useful in real-time applications and problems.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Hong Peng ◽  
Xiaohui Luo ◽  
Zhisheng Gao ◽  
Jun Wang ◽  
Zheng Pei

P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive tok-means algorithm and several evolutionary clustering algorithms recently reported in the literature.


Author(s):  
Jiulun Fan ◽  
Haiyan Yu ◽  
Yang yan ◽  
Mengfei Gao

: The kernelled possibilistic C-means clustering algorithm (KPCM) can effectively cluster hyper-sphere data with noise and outliers by introducing the kernelled method to the possibilistic C-means clustering (PCM) algorithm. However, the KPCM still suffers from the same coincident clustering problem as the PCM algorithm due to the lack of between-class relationships. Therefore, this paper introduces the cut-set theory into the KPCM and modifies the possibilistic memberships in the iterative process. Then a cutset-type kernelled possibilistic C-means clustering (CKPCM) algorithm is proposed to overcome the coincident clustering problem of the KPCM. Simultaneously a adaptive method of estimating the cut-set threshold is also given by averaging inter-class distances. Additionally, a cutset-type kernelled possibilistic C-means clustering segmentation algorithm based on the SLIC super-pixels (SS-C-KPCM) is also proposed to improve the segmentation quality and efficiency of the color images. Several experimental results on artificial data sets and image segmentation simulation results prove the excellent performance of the proposed algorithms in this paper.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Dan Zhang ◽  
Yingcang Ma ◽  
Hu Zhao ◽  
Xiaofei Yang

Clustering algorithm is one of the important research topics in the field of machine learning. Neutrosophic clustering is the generalization of fuzzy clustering and has been applied to many fields. This paper presents a new neutrosophic clustering algorithm with the help of regularization. Firstly, the regularization term is introduced into the FC-PFS algorithm to generate sparsity, which can reduce the complexity of the algorithm on large data sets. Secondly, we propose a method to simplify the process of determining regularization parameters. Finally, experiments show that the clustering results of this algorithm on artificial data sets and real data sets are mostly better than other clustering algorithms. Our clustering algorithm is effective in most cases.


Author(s):  
Krzysztof Simiński

Neuro-rough-fuzzy approach for regression modelling from missing dataReal life data sets often suffer from missing data. The neuro-rough-fuzzy systems proposed hitherto often cannot handle such situations. The paper presents a neuro-fuzzy system for data sets with missing values. The proposed solution is a complete neuro-fuzzy system. The system creates a rough fuzzy model from presented data (both full and with missing values) and is able to elaborate the answer for full and missing data examples. The paper also describes the dedicated clustering algorithm. The paper is accompanied by results of numerical experiments.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Mingzhi Ma ◽  
Qifang Luo ◽  
Yongquan Zhou ◽  
Xin Chen ◽  
Liangliang Li

Animal migration optimization (AMO) is one of the most recently introduced algorithms based on the behavior of animal swarm migration. This paper presents an improved AMO algorithm (IAMO), which significantly improves the original AMO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique and it is used in many fields. The well-known method in solving clustering problems isk-means clustering algorithm; however, it highly depends on the initial solution and is easy to fall into local optimum. To improve the defects of thek-means method, this paper used IAMO for the clustering problem and experiment on synthetic and real life data sets. The simulation results show that the algorithm has a better performance than that of thek-means, PSO, CPSO, ABC, CABC, and AMO algorithm for solving the clustering problem.


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


Sign in / Sign up

Export Citation Format

Share Document