A Rank-Constrained Clustering Algorithm with Adaptive Embedding

Constrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to the Oracle and their relation is answered with “Must-link (ML) or Cannot-link (CL).” In each iteration, first, the support vector machine (SVM) is utilized based on the label produced by the current clustering. According to the distance of each document from the hyperplane, the distance matrix is created. Also, based on cosine similarity of word2vector of each document, the similarity matrix is created. Two types of probability (similarity and degree of similarity) are calculated and they are smoothed for belonging to neighborhoods. Neighborhoods form the samples that are labeled by Oracle, to be in the same cluster. Finally, at the end of each iteration, the data with a greater level of uncertainty (in term of probability) is selected for questioning the oracle. In order to evaluate, the proposed method is compared with famous state-of-the-art methods based on two criteria and over a standard dataset. The result demonstrates an increased accuracy and stability of the obtained result with fewer questions.

Download Full-text

Optimal distributed interconnectivity of multi-robot systems by spatially-constrained clustering

Adaptive Behavior ◽

10.1177/1059712317700500 ◽

2017 ◽

Vol 25 (2) ◽

pp. 96-113 ◽

Cited By ~ 2

Author(s):

Matin Macktoobian ◽

Mahdi Aliyari Sh

Keyword(s):

Clustering Algorithm ◽

Task Assignment ◽

Constrained Clustering ◽

Loosely Coupled ◽

Robot Systems ◽

Multi Agent ◽

Probabilistic Proof ◽

Data Passing ◽

Spatially Constrained Clustering ◽

Multi Robot

A spatially-constrained clustering algorithm is presented in this paper. This algorithm is a distributed clustering approach to fine-tune the optimal distances between agents of the system to strengthen the data passing among them using a set of spatial constraints. In fact, this method will increase interconnectivity among agents and clusters, leading to improvement of the overall communicative functionality of the multi-robot system. This strategy will lead to the establishment of loosely-coupled connections among the clusters. These implicit interconnections will mobilize the clusters to receive and transmit information within the multi-agent system. In other words, this algorithm classifies each agent into the clusters with the lowest cost of local communication with its peers. This research demonstrates that the presented decentralized method will actually boost the communicative agility of the swarm by probabilistic proof of the acquired optimality. Hence, the common assumption regarding the full-knowledge of the agents’ primary locations has been fully relaxed compared to former methods. Consequently, the algorithm’s reliability and efficiency is confirmed. Furthermore, the method’s efficacy in passing information will improve the functionality of higher-level swarm operations, such as task assignment and swarm flocking. Analytical investigations and simulated accomplishments, corresponding to highly-populated swarms, prove the claimed efficiency and coherence.

Download Full-text

A spatially constrained clustering algorithm with no prior knowledge of the number of clusters

NeuroImage ◽

10.1016/s1053-8119(01)91404-1 ◽

2001 ◽

Vol 13 (6) ◽

pp. 61 ◽

Cited By ~ 1

Author(s):

Rita Almeida ◽

Anders Ledberg

Keyword(s):

Prior Knowledge ◽

Clustering Algorithm ◽

Constrained Clustering ◽

Number Of Clusters ◽

Spatially Constrained Clustering

Download Full-text

Clustering Genes Using Heterogeneous Data Sources

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2010040102 ◽

2010 ◽

Vol 1 (2) ◽

pp. 12-28 ◽

Cited By ~ 3

Author(s):

Erliang Zeng ◽

Chengyong Yang ◽

Tao Li ◽

Giri Narasimhan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Incomplete Data ◽

Clustering Algorithm ◽

Biological Data ◽

Exploratory Analysis ◽

Data Sources ◽

Modular Organization ◽

Constrained Clustering ◽

Expression Data

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.

Download Full-text

Clustering Genes Using Heterogeneous Data Sources

Computational Knowledge Discovery for Bioinformatics Research ◽

10.4018/978-1-4666-1785-8.ch005 ◽

2013 ◽

pp. 67-83

Author(s):

Erliang Zeng ◽

Chengyong Yang ◽

Tao Li ◽

Giri Narasimhan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Incomplete Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Exploratory Analysis ◽

Data Sources ◽

Constrained Clustering ◽

Expression Data ◽

Multiple Sources

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.

Download Full-text

A Constrained Clustering Algorithm for the Location of Express Shops

2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) ◽

10.1109/ieem45057.2020.9309889 ◽

2020 ◽

Author(s):

X. Zhang ◽

X. Liu ◽

J. Jiang

Keyword(s):

Clustering Algorithm ◽

Constrained Clustering

Download Full-text

How to Use Temporal-Driven Constrained Clustering to Detect Typical Evolutions

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014600136 ◽

2014 ◽

Vol 23 (04) ◽

pp. 1460013 ◽

Cited By ~ 2

Author(s):

Marian-Andrei Rizoiu ◽

Julien Velcin ◽

Stéphane Lallich

Keyword(s):

Clustering Algorithm ◽

Significant Loss ◽

Multidimensional Space ◽

Normal Distribution Function ◽

Constrained Clustering ◽

Temporal Dimension ◽

Penalty Term ◽

Temporal Space ◽

Political Studies ◽

Time Aware

In this paper, we propose a new time-aware dissimilarity measure that takes into account the temporal dimension. Observations that are close in the description space, but distant in time are considered as dissimilar. We also propose a method to enforce the segmentation contiguity, by introducing, in the objective function, a penalty term inspired from the Normal Distribution Function. We combine the two propositions into a novel time-driven constrained clustering algorithm, called TDCK-Means, which creates a partition of coherent clusters, both in the multidimensional space and in the temporal space. This algorithm uses soft semi-supervised constraints, to encourage adjacent observations belonging to the same entity to be assigned to the same cluster. We apply our algorithm to a Political Studies dataset in order to detect typical evolution phases. We adapt the Shannon entropy in order to measure the entity contiguity, and we show that our proposition consistently improves temporal cohesion of clusters, without any significant loss in the multidimensional variance.

Download Full-text