scholarly journals A Three-Level Optimization Model for Nonlinearly Separable Clustering

2020 ◽  
Vol 34 (04) ◽  
pp. 3211-3218
Author(s):  
Liang Bai ◽  
Jiye Liang

Due to the complex structure of the real-world data, nonlinearly separable clustering is one of popular and widely studied clustering problems. Currently, various types of algorithms, such as kernel k-means, spectral clustering and density clustering, have been developed to solve this problem. However, it is difficult for them to balance the efficiency and effectiveness of clustering, which limits their real applications. To get rid of the deficiency, we propose a three-level optimization model for nonlinearly separable clustering which divides the clustering problem into three sub-problems: a linearly separable clustering on the object set, a nonlinearly separable clustering on the cluster set and an ensemble clustering on the partition set. An iterative algorithm is proposed to solve the optimization problem. The proposed algorithm can use low computational cost to effectively recognize nonlinearly separable clusters. The performance of this algorithm has been studied on synthetical and real data sets. Comparisons with other nonlinearly separable clustering algorithms illustrate the efficiency and effectiveness of the proposed algorithm.

Author(s):  
Hongkang Yang ◽  
Esteban G Tabak

Abstract The clustering problem, and more generally latent factor discovery or latent space inference, is formulated in terms of the Wasserstein barycenter problem from optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the variance of the Wasserstein barycenter. Existing theory, which constrains the transport maps to rigid translations, is extended to affine transformations. The resulting non-parametric clustering algorithms include $k$-means as a special case and exhibit more robust performance. A continuous version of these algorithms discovers continuous latent variables and generalizes principal curves. The strength of these algorithms is demonstrated by tests on both artificial and real-world data sets.


2019 ◽  
Vol 24 (1-2) ◽  
pp. 101-107
Author(s):  
Trokhymchuk R.M. ◽  

This work is devoted to the testing, research and comparative analysis of the most well-known and widely used methods and algorithms for clustering numerical data sets. Multidimensional scaling was applied to evaluate the results of solving the clustering problem by visualizing datasets at all stages of the implementation of the studied algorithms. All algorithms were tested for artificial and real data sets. As a result, for each of the investigated algorithms, the main characteristics were formulated in the form of their relative strengths and weaknesses. Based on the test results, conclusions and recommendations for using these algorithms are formulated.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Yu Wang

Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH), to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.


Author(s):  
Chunhua Ren ◽  
Linfu Sun

AbstractThe classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.


2021 ◽  
pp. 1-18
Author(s):  
Angeliki Koutsimpela ◽  
Konstantinos D. Koutroumbas

Several well known clustering algorithms have their own online counterparts, in order to deal effectively with the big data issue, as well as with the case where the data become available in a streaming fashion. However, very few of them follow the stochastic gradient descent philosophy, despite the fact that the latter enjoys certain practical advantages (such as the possibility of (a) running faster than their batch processing counterparts and (b) escaping from local minima of the associated cost function), while, in addition, strong theoretical convergence results have been established for it. In this paper a novel stochastic gradient descent possibilistic clustering algorithm, called O- PCM 2 is introduced. The algorithm is presented in detail and it is rigorously proved that the gradient of the associated cost function tends to zero in the L 2 sense, based on general convergence results established for the family of the stochastic gradient descent algorithms. Furthermore, an additional discussion is provided on the nature of the points where the algorithm may converge. Finally, the performance of the proposed algorithm is tested against other related algorithms, on the basis of both synthetic and real data sets.


Author(s):  
Lifang Zhou ◽  
Guang Deng ◽  
Weisheng Li ◽  
Jianxun Mi ◽  
Bangjun Lei

Current state-of-the-art detectors achieved impressive performance in detection accuracy with the use of deep learning. However, most of such detectors cannot detect objects in real time due to heavy computational cost, which limits their wide application. Although some one-stage detectors are designed to accelerate the detection speed, it is still not satisfied for task in high-resolution remote sensing images. To address this problem, a lightweight one-stage approach based on YOLOv3 is proposed in this paper, which is named Squeeze-and-Excitation YOLOv3 (SE-YOLOv3). The proposed algorithm maintains high efficiency and effectiveness simultaneously. With an aim to reduce the number of parameters and increase the ability of feature description, two customized modules, lightweight feature extraction and attention-aware feature augmentation, are embedded by utilizing global information and suppressing redundancy features, respectively. To meet the scale invariance, a spatial pyramid pooling method is used to aggregate local features. The evaluation experiments on two remote sensing image data sets, DOTA and NWPU VHR-10, reveal that the proposed approach achieves more competitive detection effect with less computational consumption.


Author(s):  
Deepali Virmani ◽  
Nikita Jain ◽  
Ketan Parikh ◽  
Shefali Upadhyaya ◽  
Abhishek Srivastav

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.


2019 ◽  
Vol 22 (2) ◽  
pp. 255-270 ◽  
Author(s):  
Manuel D. Ortigueira ◽  
Valeriy Martynyuk ◽  
Mykola Fedula ◽  
J. Tenreiro Machado

Abstract The ability of the so-called Caputo-Fabrizio (CF) and Atangana-Baleanu (AB) operators to create suitable models for real data is tested with real world data. Two alternative models based on the CF and AB operators are assessed and compared with known models for data sets obtained from electrochemical capacitors and the human body electrical impedance. The results show that the CF and AB descriptions perform poorly when compared with the classical fractional derivatives.


Geophysics ◽  
2020 ◽  
Vol 85 (2) ◽  
pp. V223-V232 ◽  
Author(s):  
Zhicheng Geng ◽  
Xinming Wu ◽  
Sergey Fomel ◽  
Yangkang Chen

The seislet transform uses the wavelet-lifting scheme and local slopes to analyze the seismic data. In its definition, the designing of prediction operators specifically for seismic images and data is an important issue. We have developed a new formulation of the seislet transform based on the relative time (RT) attribute. This method uses the RT volume to construct multiscale prediction operators. With the new prediction operators, the seislet transform gets accelerated because distant traces get predicted directly. We apply our method to synthetic and real data to demonstrate that the new approach reduces computational cost and obtains excellent sparse representation on test data sets.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Peng Zhang ◽  
Kun She

The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the first stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational influence coefficient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.


Sign in / Sign up

Export Citation Format

Share Document