Comparison of centroid-based clustering algorithms in the context of divide and conquer paradigm based FMST framework

Author(s):  
Sabhijiit S. Sandhu ◽  
Ashwin R. Jadhav ◽  
B.K. Tripathy
Author(s):  
Juanjuan Luo ◽  
Huadong Ma ◽  
Dongqing Zhou

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.


Water ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 1002 ◽  
Author(s):  
Xuan Khoa Bui ◽  
Malvin S. Marlim ◽  
Doosun Kang

A water distribution network (WDN) is an indispensable element of civil infrastructure that provides fresh water for domestic use, industrial development, and fire-fighting. However, in a large and complex network, operation and management (O&M) can be challenging. As a technical initiative to improve O&M efficiency, the paradigm of “divide and conquer” can divide an original WDN into multiple subnetworks. Each subnetwork is controlled by boundary pipes installed with gate valves or flow meters that control the water volume entering and leaving what are known as district metered areas (DMAs). Many approaches to creating DMAs are formulated as two-phase procedures, clustering and sectorizing, and are called water network partitioning (WNP) in general. To assess the benefits and drawbacks of DMAs in a WDN, we provide a comprehensive review of various state-of-the-art approaches, which can be broadly classified as: (1) Clustering algorithms, which focus on defining the optimal configuration of DMAs; and (2) sectorization procedures, which physically decompose the network by selecting pipes for installing flow meters or gate valves. We also provide an overview of emerging problems that need to be studied.


2021 ◽  
Vol 7 (4) ◽  
pp. 1-41
Author(s):  
Radu Mariescu-Istodor ◽  
Alexandru Cristian ◽  
Mihai Negrea ◽  
Peiwei Cao

The Vehicle Routing Problem (VRP) is an NP hard problem where we need to optimize itineraries for agents to visit multiple targets. When considering real-world travel (road-network topology, speed limits and traffic), modern VRP solvers can only process small instances with a few hundred targets. We propose a framework (VRPDiv) that can scale any solver to support larger VRP instances with up to ten thousand targets (10k) by dividing them into smaller clusters. VRPDiv supports the multiple VRP scenarios and contains a pool of clustering algorithms from which it chooses the ideal one depending on properties of the instance. VRPDiv assigns agents based on cluster demand and targets compatibility (i.e. realizable time-windows and capacity limitations). We incorporate the framework into the Bing Maps Multi-Itinerary Optimization (MIO) 1 online service. This architecture allows MIO to scale up from solving instances with a few hundred to over 10k targets in under 10 minutes. We evaluate our framework on public datasets and publish a new dataset ourselves, as large enough instances supporting real-world travel were impossible to find. We investigate multiple clustering methods and show that choosing the correct one is critical with differences of up to 60% in quality. We compare with relevant baselines and report a 40% improvement in target allocation and a 9.8% improvement in itinerary durations. We compare with existing scores and report an average delta of 10%, with lower values (<5%) in instances with low workload (few targets per agent), which are acceptable for an online service.


2021 ◽  
Vol 118 (44) ◽  
pp. e2100482118
Author(s):  
Soumendu Sundar Mukherjee ◽  
Purnamrita Sarkar ◽  
Peter J. Bickel

In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2017 ◽  
Vol 5 (12) ◽  
pp. 323-325
Author(s):  
E. Mahima Jane ◽  
◽  
◽  
E. George Dharma Prakash Raj

2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Sign in / Sign up

Export Citation Format

Share Document