A Three-Level Optimization Model for Nonlinearly Separable Clustering

Liang Bai; Jiye Liang

doi:10.1609/aaai.v34i04.5719

A Three-Level Optimization Model for Nonlinearly Separable Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5719 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3211-3218

Author(s):

Liang Bai ◽

Jiye Liang

Keyword(s):

Optimization Model ◽

Clustering Algorithms ◽

Complex Structure ◽

Computational Cost ◽

Real Data ◽

Data Sets ◽

Real World Data ◽

Clustering Problem ◽

Efficiency And Effectiveness ◽

Clustering Problems

Due to the complex structure of the real-world data, nonlinearly separable clustering is one of popular and widely studied clustering problems. Currently, various types of algorithms, such as kernel k-means, spectral clustering and density clustering, have been developed to solve this problem. However, it is difficult for them to balance the efficiency and effectiveness of clustering, which limits their real applications. To get rid of the deficiency, we propose a three-level optimization model for nonlinearly separable clustering which divides the clustering problem into three sub-problems: a linearly separable clustering on the object set, a nonlinearly separable clustering on the cluster set and an ensemble clustering on the partition set. An iterative algorithm is proposed to solve the optimization problem. The proposed algorithm can use low computational cost to effectively recognize nonlinearly separable clusters. The performance of this algorithm has been studied on synthetical and real data sets. Comparisons with other nonlinearly separable clustering algorithms illustrate the efficiency and effectiveness of the proposed algorithm.

Download Full-text

Clustering, factor discovery and optimal transport

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa040 ◽

2020 ◽

Author(s):

Hongkang Yang ◽

Esteban G Tabak

Keyword(s):

Latent Variables ◽

Optimal Transport ◽

Clustering Algorithms ◽

Data Sets ◽

Affine Transformations ◽

Real World Data ◽

Continuous Version ◽

Clustering Problem ◽

Latent Space ◽

Transport Maps

Abstract The clustering problem, and more generally latent factor discovery or latent space inference, is formulated in terms of the Wasserstein barycenter problem from optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the variance of the Wasserstein barycenter. Existing theory, which constrains the transport maps to rigid translations, is extended to affine transformations. The resulting non-parametric clustering algorithms include $k$-means as a special case and exhibit more robust performance. A continuous version of these algorithms discovers continuous latent variables and generalizes principal curves. The strength of these algorithms is demonstrated by tests on both artificial and real-world data sets.

Download Full-text

Results of testing, research and analysis of the basic clustering algorithms of numerical data sets

Artificial Intelligence ◽

10.15407/jai2019.01-02.101 ◽

2019 ◽

Vol 24 (1-2) ◽

pp. 101-107

Author(s):

Trokhymchuk R.M. ◽

Keyword(s):

Comparative Analysis ◽

Multidimensional Scaling ◽

Clustering Algorithms ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Test Results ◽

Clustering Problem

This work is devoted to the testing, research and comparative analysis of the most well-known and widely used methods and algorithms for clustering numerical data sets. Multidimensional scaling was applied to evaluate the results of solving the clustering problem by visualizing datasets at all stages of the implementation of the studied algorithms. All algorithms were tested for artificial and real data sets. As a result, for each of the investigated algorithms, the main characteristics were formulated in the form of their relative strengths and weaknesses. Based on the test results, conclusions and recommendations for using these algorithms are formulated.

Download Full-text

An Incremental Classification Algorithm for Mining Data with Feature Space Heterogeneity

Mathematical Problems in Engineering ◽

10.1155/2014/327142 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Yu Wang

Keyword(s):

Feature Space ◽

Classification Problem ◽

Classification Algorithm ◽

Data Sets ◽

Real World Data ◽

Supervised Clustering ◽

Online Classification ◽

Efficiency And Effectiveness ◽

Feature Relevance ◽

Incremental Classification

Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH), to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.

Download Full-text

A Bi-directional Fuzzy C-Means Clustering Ensemble Algorithm Considering Local Information

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00014-z ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Chunhua Ren ◽

Linfu Sun

Keyword(s):

Clustering Algorithms ◽

Real Data ◽

Local Information ◽

Data Sets ◽

Clustering Ensemble ◽

K Nearest Neighbors ◽

Fuzzy C Means ◽

Clustering Quality ◽

Fuzzy C Means Clustering ◽

Fcm Clustering

AbstractThe classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.

Download Full-text

A new stochastic gradient descent possibilistic clustering algorithm

AI Communications ◽

10.3233/aic-210125 ◽

2021 ◽

pp. 1-18

Author(s):

Angeliki Koutsimpela ◽

Konstantinos D. Koutroumbas

Keyword(s):

Cost Function ◽

Gradient Descent ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Data Sets ◽

Convergence Results ◽

Possibilistic Clustering

Several well known clustering algorithms have their own online counterparts, in order to deal effectively with the big data issue, as well as with the case where the data become available in a streaming fashion. However, very few of them follow the stochastic gradient descent philosophy, despite the fact that the latter enjoys certain practical advantages (such as the possibility of (a) running faster than their batch processing counterparts and (b) escaping from local minima of the associated cost function), while, in addition, strong theoretical convergence results have been established for it. In this paper a novel stochastic gradient descent possibilistic clustering algorithm, called O- PCM 2 is introduced. The algorithm is presented in detail and it is rigorously proved that the gradient of the associated cost function tends to zero in the L 2 sense, based on general convergence results established for the family of the stochastic gradient descent algorithms. Furthermore, an additional discussion is provided on the nature of the points where the algorithm may converge. Finally, the performance of the proposed algorithm is tested against other related algorithms, on the basis of both synthetic and real data sets.

Download Full-text

A Lightweight SE-YOLOv3 Network for Multi-Scale Object Detection in Remote Sensing Imagery

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421500373 ◽

2021 ◽

Author(s):

Lifang Zhou ◽

Guang Deng ◽

Weisheng Li ◽

Jianxun Mi ◽

Bangjun Lei

Keyword(s):

Remote Sensing ◽

High Efficiency ◽

Computational Cost ◽

Image Data ◽

Detection Accuracy ◽

Data Sets ◽

One Stage ◽

Efficiency And Effectiveness ◽

Spatial Pyramid Pooling ◽

Feature Augmentation

Current state-of-the-art detectors achieved impressive performance in detection accuracy with the use of deep learning. However, most of such detectors cannot detect objects in real time due to heavy computational cost, which limits their wide application. Although some one-stage detectors are designed to accelerate the detection speed, it is still not satisfied for task in high-resolution remote sensing images. To address this problem, a lightweight one-stage approach based on YOLOv3 is proposed in this paper, which is named Squeeze-and-Excitation YOLOv3 (SE-YOLOv3). The proposed algorithm maintains high efficiency and effectiveness simultaneously. With an aim to reduce the number of parameters and increase the ability of feature description, two customized modules, lightweight feature extraction and attention-aware feature augmentation, are embedded by utilizing global information and suppressing redundancy features, respectively. To meet the scale invariance, a spatial pyramid pooling method is used to aggregate local features. The evaluation experiments on two remote sensing image data sets, DOTA and NWPU VHR-10, reveal that the proposed approach achieves more competitive detection effect with less computational consumption.

Download Full-text

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2018010104 ◽

2018 ◽

Vol 8 (1) ◽

pp. 42-59

Author(s):

Deepali Virmani ◽

Nikita Jain ◽

Ketan Parikh ◽

Shefali Upadhyaya ◽

Abhishek Srivastav

Keyword(s):

Unsupervised Learning ◽

Real World ◽

Learning Algorithms ◽

Clustering Algorithms ◽

Real World Data ◽

World Data ◽

Clustering Problem ◽

Time Required ◽

Selection Of

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.

Download Full-text

The failure of certain fractional calculus operators in two physical models

Fractional Calculus and Applied Analysis ◽

10.1515/fca-2019-0017 ◽

2019 ◽

Vol 22 (2) ◽

pp. 255-270 ◽

Cited By ~ 12

Author(s):

Manuel D. Ortigueira ◽

Valeriy Martynyuk ◽

Mykola Fedula ◽

J. Tenreiro Machado

Keyword(s):

Fractional Calculus ◽

Human Body ◽

Real World ◽

Fractional Derivatives ◽

Electrical Impedance ◽

Real Data ◽

Physical Models ◽

Data Sets ◽

Real World Data ◽

Fractional Calculus Operators

Abstract The ability of the so-called Caputo-Fabrizio (CF) and Atangana-Baleanu (AB) operators to create suitable models for real data is tested with real world data. Two alternative models based on the CF and AB operators are assessed and compared with known models for data sets obtained from electrochemical capacitors and the human body electrical impedance. The results show that the CF and AB descriptions perform poorly when compared with the classical fractional derivatives.

Download Full-text

Relative time seislet transform

Geophysics ◽

10.1190/geo2019-0212.1 ◽

2020 ◽

Vol 85 (2) ◽

pp. V223-V232 ◽

Cited By ~ 1

Author(s):

Zhicheng Geng ◽

Xinming Wu ◽

Sergey Fomel ◽

Yangkang Chen

Keyword(s):

Seismic Data ◽

Computational Cost ◽

Lifting Scheme ◽

Real Data ◽

Data Sets ◽

Relative Time ◽

New Approach ◽

New Formulation ◽

Wavelet Lifting ◽

Seismic Images

The seislet transform uses the wavelet-lifting scheme and local slopes to analyze the seismic data. In its definition, the designing of prediction operators specifically for seismic images and data is an important issue. We have developed a new formulation of the seislet transform based on the relative time (RT) attribute. This method uses the RT volume to construct multiscale prediction operators. With the new prediction operators, the seislet transform gets accelerated because distant traces get predicted directly. We apply our method to synthetic and real data to demonstrate that the new approach reduces computational cost and obtains excellent sparse representation on test data sets.

Download Full-text

A Novel Hierarchical Clustering Approach Based on Universal Gravitation

Mathematical Problems in Engineering ◽

10.1155/2020/6748056 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Peng Zhang ◽

Kun She

Keyword(s):

Hierarchical Clustering ◽

Clustering Analysis ◽

Gravitational Force ◽

Clustering Algorithms ◽

Influence Coefficient ◽

Data Sets ◽

Universal Gravitation ◽

Real World Data ◽

Gravitational Influence ◽

Clustering Approach

The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the first stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational influence coefficient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.

Download Full-text