scholarly journals Two provably consistent divide-and-conquer clustering algorithms for large networks

2021 ◽  
Vol 118 (44) ◽  
pp. e2100482118
Author(s):  
Soumendu Sundar Mukherjee ◽  
Purnamrita Sarkar ◽  
Peter J. Bickel

In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.

Author(s):  
Juanjuan Luo ◽  
Huadong Ma ◽  
Dongqing Zhou

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.


Author(s):  
Hui Du ◽  
Yuping Wang ◽  
Xiaopan Dong

Clustering is a popular and effective method for image segmentation. However, existing cluster methods often suffer the following problems: (1) Need a huge space and a lot of computation when the input data are large. (2) Need to assign some parameters (e.g. number of clusters) in advance which will affect the clustering results greatly. To save the space and computation, reduce the sensitivity of the parameters, and improve the effectiveness and efficiency of the clustering algorithms, we construct a new clustering algorithm for image segmentation. The new algorithm consists of two phases: coarsening clustering and exact clustering. First, we use Affinity Propagation (AP) algorithm for coarsening. Specifically, in order to save the space and computational cost, we only compute the similarity between each point and its t nearest neighbors, and get a condensed similarity matrix (with only t columns, where t << N and N is the number of data points). Second, to further improve the efficiency and effectiveness of the proposed algorithm, the Self-tuning Spectral Clustering (SSC) is used to the resulted points (the representative points gotten in the first phase) to do the exact clustering. As a result, the proposed algorithm can quickly and precisely realize the clustering for texture image segmentation. The experimental results show that the proposed algorithm is more efficient than the compared algorithms FCM, K-means and SOM.


2020 ◽  
Vol 34 (04) ◽  
pp. 3211-3218
Author(s):  
Liang Bai ◽  
Jiye Liang

Due to the complex structure of the real-world data, nonlinearly separable clustering is one of popular and widely studied clustering problems. Currently, various types of algorithms, such as kernel k-means, spectral clustering and density clustering, have been developed to solve this problem. However, it is difficult for them to balance the efficiency and effectiveness of clustering, which limits their real applications. To get rid of the deficiency, we propose a three-level optimization model for nonlinearly separable clustering which divides the clustering problem into three sub-problems: a linearly separable clustering on the object set, a nonlinearly separable clustering on the cluster set and an ensemble clustering on the partition set. An iterative algorithm is proposed to solve the optimization problem. The proposed algorithm can use low computational cost to effectively recognize nonlinearly separable clusters. The performance of this algorithm has been studied on synthetical and real data sets. Comparisons with other nonlinearly separable clustering algorithms illustrate the efficiency and effectiveness of the proposed algorithm.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Israel F. Araujo ◽  
Daniel K. Park ◽  
Francesco Petruccione ◽  
Adenilton J. da Silva

AbstractAdvantages in several fields of research and industry are expected with the rise of quantum computers. However, the computational cost to load classical data in quantum computers can impose restrictions on possible quantum speedups. Known algorithms to create arbitrary quantum states require quantum circuits with depth O(N) to load an N-dimensional vector. Here, we show that it is possible to load an N-dimensional vector with exponential time advantage using a quantum circuit with polylogarithmic depth and entangled information in ancillary qubits. Results show that we can efficiently load data in quantum devices using a divide-and-conquer strategy to exchange computational time for space. We demonstrate a proof of concept on a real quantum device and present two applications for quantum machine learning. We expect that this new loading strategy allows the quantum speedup of tasks that require to load a significant volume of information to quantum devices.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


Author(s):  
Tarun Gangwar ◽  
Dominik Schillinger

AbstractWe present a concurrent material and structure optimization framework for multiphase hierarchical systems that relies on homogenization estimates based on continuum micromechanics to account for material behavior across many different length scales. We show that the analytical nature of these estimates enables material optimization via a series of inexpensive “discretization-free” constraint optimization problems whose computational cost is independent of the number of hierarchical scales involved. To illustrate the strength of this unique property, we define new benchmark tests with several material scales that for the first time become computationally feasible via our framework. We also outline its potential in engineering applications by reproducing self-optimizing mechanisms in the natural hierarchical system of bamboo culm tissue.


Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1773
Author(s):  
Bogdan Walek ◽  
Ondrej Pektor ◽  
Radim Farana

This paper describes a novel approach in the area of evaluating suitable job applicants for various job positions, and specifies typical areas of requirement and their usage. Requirements for this decision-support system are defined in order to be used in middle-size companies. Suitable tools chosen were fuzzy expert systems, primarily the inference system Takagi-Sugeno type, which were then supplied with implementation of methods of variant multi-criteria analysis. The resulting system is a variable tool with the possibility to simply set the importance of individual selection criteria so that it can be used in various situations, primarily in repeated selection procedures for similar job positions. A strong emphasis is devoted to the explanatory module, which enables the results of the expert system to be used easily. Verification of the system on real data in cooperation with a collaborating company has proved that the system is easily usable.


2014 ◽  
Vol 1 (4) ◽  
pp. 256-265 ◽  
Author(s):  
Hong Seok Park ◽  
Trung Thanh Nguyen

Abstract Energy efficiency is an essential consideration in sustainable manufacturing. This study presents the car fender-based injection molding process optimization that aims to resolve the trade-off between energy consumption and product quality at the same time in which process parameters are optimized variables. The process is specially optimized by applying response surface methodology and using nondominated sorting genetic algorithm II (NSGA II) in order to resolve multi-object optimization problems. To reduce computational cost and time in the problem-solving procedure, the combination of CAE-integration tools is employed. Based on the Pareto diagram, an appropriate solution is derived out to obtain optimal parameters. The optimization results show that the proposed approach can help effectively engineers in identifying optimal process parameters and achieving competitive advantages of energy consumption and product quality. In addition, the engineering analysis that can be employed to conduct holistic optimization of the injection molding process in order to increase energy efficiency and product quality was also mentioned in this paper.


2014 ◽  
Vol 687-691 ◽  
pp. 1350-1353
Author(s):  
Li Li Fu ◽  
Yong Li Liu ◽  
Li Jing Hao

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.


Author(s):  
Weilin Nie ◽  
Cheng Wang

Abstract Online learning is a classical algorithm for optimization problems. Due to its low computational cost, it has been widely used in many aspects of machine learning and statistical learning. Its convergence performance depends heavily on the step size. In this paper, a two-stage step size is proposed for the unregularized online learning algorithm, based on reproducing Kernels. Theoretically, we prove that, such an algorithm can achieve a nearly min–max convergence rate, up to some logarithmic term, without any capacity condition.


Sign in / Sign up

Export Citation Format

Share Document