A New Cutset-type Kernelled Possibilistic C-Means Clustering Segmentation Algorithm Based on SLIC Super-pixels

Author(s):  
Jiulun Fan ◽  
Haiyan Yu ◽  
Yang yan ◽  
Mengfei Gao

: The kernelled possibilistic C-means clustering algorithm (KPCM) can effectively cluster hyper-sphere data with noise and outliers by introducing the kernelled method to the possibilistic C-means clustering (PCM) algorithm. However, the KPCM still suffers from the same coincident clustering problem as the PCM algorithm due to the lack of between-class relationships. Therefore, this paper introduces the cut-set theory into the KPCM and modifies the possibilistic memberships in the iterative process. Then a cutset-type kernelled possibilistic C-means clustering (CKPCM) algorithm is proposed to overcome the coincident clustering problem of the KPCM. Simultaneously a adaptive method of estimating the cut-set threshold is also given by averaging inter-class distances. Additionally, a cutset-type kernelled possibilistic C-means clustering segmentation algorithm based on the SLIC super-pixels (SS-C-KPCM) is also proposed to improve the segmentation quality and efficiency of the color images. Several experimental results on artificial data sets and image segmentation simulation results prove the excellent performance of the proposed algorithms in this paper.

2021 ◽  
Vol 25 (6) ◽  
pp. 1507-1524
Author(s):  
Chunying Zhang ◽  
Ruiyan Gao ◽  
Jiahao Wang ◽  
Song Chen ◽  
Fengchun Liu ◽  
...  

In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.


Author(s):  
Pierre-Alexandre Murena ◽  
Jérémie Sublime ◽  
Basarab Matei ◽  
Antoine Cornuéjols

Clustering is a compression task which consists in grouping similar objects into clusters. In real-life applications, the system may have access to several views of the same data and each view may be processed by a specific clustering algorithm: this framework is called multi-view clustering and can benefit from algorithms capable of exchanging information between the different views. In this paper, we consider this type of unsupervised ensemble learning as a compression problem and develop a theoretical framework based on algorithmic theory of information suitable for multi-view clustering and collaborative clustering applications. Using this approach, we propose a new algorithm based on solid theoretical basis, and test it on several real and artificial data sets.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Li Guo ◽  
Kunlin Zhu ◽  
Ruijun Duan

In order to explore the economic development trend in the postepidemic era, this paper improves the traditional clustering algorithm and constructs a postepidemic economic development trend analysis model based on intelligent algorithms. In order to solve the clustering problem of large-scale nonuniform density data sets, this paper proposes an adaptive nonuniform density clustering algorithm based on balanced iterative reduction and uses the algorithm to further cluster the compressed data sets. For large-scale data sets, the clustering results can accurately reflect the class characteristics of the data set as a whole. Moreover, the algorithm greatly improves the time efficiency of clustering. From the research results, we can see that the improved clustering algorithm has a certain effect on the analysis of economic development trends in the postepidemic era and can continue to play a role in subsequent economic analysis.


2012 ◽  
Vol 263-266 ◽  
pp. 2203-2206
Author(s):  
Bin Liu ◽  
Long Wang ◽  
Hai Yan Liu

In this paper, linking with the basic principle of FCM (Fuzzy c-means clustering) algorithm, on the basis of theory research, a method of the cluster analysis of FCM based on sober extraction algorithm is proposed. To insure the quality of image reconstruction and the edge information extraction, the characters of sober operator is analyzed. Firstly, the approximate optimal solution obtained by the improved FCM algorithm is taken as the original value, then combined with intensity-texture-position feature space in order to produce connected regions shown in the image. The final segmentation result is achieved at last. The experiment results prove that in the view of the image segmentation, this segmentation algorithm based on sober extraction algorithm provides fast segmentation with high perceptual segmentation quality.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Dan Zhang ◽  
Yingcang Ma ◽  
Hu Zhao ◽  
Xiaofei Yang

Clustering algorithm is one of the important research topics in the field of machine learning. Neutrosophic clustering is the generalization of fuzzy clustering and has been applied to many fields. This paper presents a new neutrosophic clustering algorithm with the help of regularization. Firstly, the regularization term is introduced into the FC-PFS algorithm to generate sparsity, which can reduce the complexity of the algorithm on large data sets. Secondly, we propose a method to simplify the process of determining regularization parameters. Finally, experiments show that the clustering results of this algorithm on artificial data sets and real data sets are mostly better than other clustering algorithms. Our clustering algorithm is effective in most cases.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Mingzhi Ma ◽  
Qifang Luo ◽  
Yongquan Zhou ◽  
Xin Chen ◽  
Liangliang Li

Animal migration optimization (AMO) is one of the most recently introduced algorithms based on the behavior of animal swarm migration. This paper presents an improved AMO algorithm (IAMO), which significantly improves the original AMO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique and it is used in many fields. The well-known method in solving clustering problems isk-means clustering algorithm; however, it highly depends on the initial solution and is easy to fall into local optimum. To improve the defects of thek-means method, this paper used IAMO for the clustering problem and experiment on synthetic and real life data sets. The simulation results show that the algorithm has a better performance than that of thek-means, PSO, CPSO, ABC, CABC, and AMO algorithm for solving the clustering problem.


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


2021 ◽  
Vol 13 (9) ◽  
pp. 4648
Author(s):  
Rana Muhammad Adnan ◽  
Kulwinder Singh Parmar ◽  
Salim Heddam ◽  
Shamsuddin Shahid ◽  
Ozgur Kisi

The accurate estimation of suspended sediments (SSs) carries significance in determining the volume of dam storage, river carrying capacity, pollution susceptibility, soil erosion potential, aquatic ecological impacts, and the design and operation of hydraulic structures. The presented study proposes a new method for accurately estimating daily SSs using antecedent discharge and sediment information. The novel method is developed by hybridizing the multivariate adaptive regression spline (MARS) and the Kmeans clustering algorithm (MARS–KM). The proposed method’s efficacy is established by comparing its performance with the adaptive neuro-fuzzy system (ANFIS), MARS, and M5 tree (M5Tree) models in predicting SSs at two stations situated on the Yangtze River of China, according to the three assessment measurements, RMSE, MAE, and NSE. Two modeling scenarios are employed; data are divided into 50–50% for model training and testing in the first scenario, and the training and test data sets are swapped in the second scenario. In Guangyuan Station, the MARS–KM showed a performance improvement compared to ANFIS, MARS, and M5Tree methods in term of RMSE by 39%, 30%, and 18% in the first scenario and by 24%, 22%, and 8% in the second scenario, respectively, while the improvement in RMSE of ANFIS, MARS, and M5Tree was 34%, 26%, and 27% in the first scenario and 7%, 16%, and 6% in the second scenario, respectively, at Beibei Station. Additionally, the MARS–KM models provided much more satisfactory estimates using only discharge values as inputs.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Daniele Raimondi ◽  
Antoine Passemiers ◽  
Piero Fariselli ◽  
Yves Moreau

Abstract Background Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task. Results In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions. Conclusions To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open.


2011 ◽  
Vol 121-126 ◽  
pp. 2372-2376
Author(s):  
Dan Dan Wang ◽  
Yu Zhou ◽  
Qing Wei Ye ◽  
Xiao Dong Wang

The mode peaks in frequency domain of vibration signal are strongly interfered by strong noise, causing the inaccuracy mode parameters. According to this situation, this paper comes up with the thought of mode-peak segmentation based on the spectral clustering algorithm. First, according to the concept of wave packet, the amplitude-frequency of vibration signal is divided into wave packets. Taking each wave packet as a sample of clustering algorithm, the spectral clustering algorithm is used to classify these wave packets. The amplitude-frequency curve of a mode peak becomes a big wave packet in macroscopic. The experiment to simulation signals indicates that this spectral clustering algorithm could accord with the macroscopic observation of mode segmentation effectively, and has outstanding performance especially in strong noise.


Sign in / Sign up

Export Citation Format

Share Document