Comparison of centroid-based clustering algorithms in the context of divide and conquer paradigm based FMST framework

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.

Download Full-text

Water Network Partitioning into District Metered Areas: A State-Of-The-Art Review

Water ◽

10.3390/w12041002 ◽

2020 ◽

Vol 12 (4) ◽

pp. 1002 ◽

Cited By ~ 6

Author(s):

Xuan Khoa Bui ◽

Malvin S. Marlim ◽

Doosun Kang

Keyword(s):

Industrial Development ◽

State Of The Art ◽

Clustering Algorithms ◽

Divide And Conquer ◽

Water Volume ◽

Network Partitioning ◽

Water Network ◽

Two Phase ◽

Flow Meters ◽

Water Network Partitioning

A water distribution network (WDN) is an indispensable element of civil infrastructure that provides fresh water for domestic use, industrial development, and fire-fighting. However, in a large and complex network, operation and management (O&M) can be challenging. As a technical initiative to improve O&M efficiency, the paradigm of “divide and conquer” can divide an original WDN into multiple subnetworks. Each subnetwork is controlled by boundary pipes installed with gate valves or flow meters that control the water volume entering and leaving what are known as district metered areas (DMAs). Many approaches to creating DMAs are formulated as two-phase procedures, clustering and sectorizing, and are called water network partitioning (WNP) in general. To assess the benefits and drawbacks of DMAs in a WDN, we provide a comprehensive review of various state-of-the-art approaches, which can be broadly classified as: (1) Clustering algorithms, which focus on defining the optimal configuration of DMAs; and (2) sectorization procedures, which physically decompose the network by selecting pipes for installing flow meters or gate valves. We also provide an overview of emerging problems that need to be studied.

Download Full-text

VRPDiv: A Divide and Conquer Framework for Large Vehicle Routing Problems

ACM Transactions on Spatial Algorithms and Systems ◽

10.1145/3474832 ◽

2021 ◽

Vol 7 (4) ◽

pp. 1-41

Author(s):

Radu Mariescu-Istodor ◽

Alexandru Cristian ◽

Mihai Negrea ◽

Peiwei Cao

Keyword(s):

Vehicle Routing ◽

Real World ◽

Time Windows ◽

Clustering Algorithms ◽

Scale Up ◽

Divide And Conquer ◽

Clustering Methods ◽

Online Service ◽

Routing Problem ◽

Public Datasets

The Vehicle Routing Problem (VRP) is an NP hard problem where we need to optimize itineraries for agents to visit multiple targets. When considering real-world travel (road-network topology, speed limits and traffic), modern VRP solvers can only process small instances with a few hundred targets. We propose a framework (VRPDiv) that can scale any solver to support larger VRP instances with up to ten thousand targets (10k) by dividing them into smaller clusters. VRPDiv supports the multiple VRP scenarios and contains a pool of clustering algorithms from which it chooses the ideal one depending on properties of the instance. VRPDiv assigns agents based on cluster demand and targets compatibility (i.e. realizable time-windows and capacity limitations). We incorporate the framework into the Bing Maps Multi-Itinerary Optimization (MIO) 1 online service. This architecture allows MIO to scale up from solving instances with a few hundred to over 10k targets in under 10 minutes. We evaluate our framework on public datasets and publish a new dataset ourselves, as large enough instances supporting real-world travel were impossible to find. We investigate multiple clustering methods and show that choosing the correct one is critical with differences of up to 60% in quality. We compare with relevant baselines and report a 40% improvement in target allocation and a 9.8% improvement in itinerary durations. We compare with existing scores and report an average delta of 10%, with lower values (<5%) in instances with low workload (few targets per agent), which are acceptable for an online service.

Download Full-text

Two provably consistent divide-and-conquer clustering algorithms for large networks

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2100482118 ◽

2021 ◽

Vol 118 (44) ◽

pp. e2100482118

Author(s):

Soumendu Sundar Mukherjee ◽

Purnamrita Sarkar ◽

Peter J. Bickel

Keyword(s):

Spectral Clustering ◽

Optimization Problems ◽

Clustering Algorithms ◽

Computational Cost ◽

Real Data ◽

Divide And Conquer ◽

Detection Problem ◽

Large Networks ◽

Selection Procedures ◽

Improving Accuracy

In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.

Download Full-text

Scaffold Safety Analysis: Focusing on Divide-and-Conquer Method

Construction Research Congress 2020 ◽

10.1061/9780784482865.023 ◽

2020 ◽

Author(s):

Sayan Sakhakarmi ◽

Chunhee Cho ◽

JeeWoong Park

Keyword(s):

Safety Analysis ◽

Divide And Conquer

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

Preimage Attacks on Reduced Troika with Divide-and-Conquer Methods

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.2019eap1166 ◽

2020 ◽

Vol E103.A (11) ◽

pp. 1260-1273

Author(s):

Fukang LIU ◽

Takanori ISOBE

Keyword(s):

Divide And Conquer

Download Full-text

Survey on Partition based Clustering Algorithms in Big Data

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i12.323325 ◽

2017 ◽

Vol 5 (12) ◽

pp. 323-325

Author(s):

E. Mahima Jane ◽

◽

E. George Dharma Prakash Raj

Keyword(s):

Big Data ◽

Clustering Algorithms

Download Full-text

A SURVEY ON VARIED DISTRIBUTED CLUSTERING ALGORITHMS FOR WIRELESS SENSOR NETWORKS

i-manager s Journal on Communication Engineering and Systems ◽

10.26634/jcs.7.1.13963 ◽

2018 ◽

Vol 7 (1) ◽

pp. 35

Author(s):

PRADHAN SWAGATIKA ◽

PATNAIK PAWAN ◽

S.D. MISHRA ◽

◽

...

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Clustering Algorithms ◽

Wireless Sensor ◽

Distributed Clustering

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text