Efficient size-bounded community search over large networks

2021 ◽  
Vol 14 (8) ◽  
pp. 1441-1453
Author(s):  
Kai Yao ◽  
Lijun Chang

The problem of community search, which aims to find a cohesive subgraph containing user-given query vertices, has been extensively studied recently. Most of the existing studies mainly focus on the cohesiveness of the returned community, while ignoring the size of the community, and may yield communities of very large sizes. However, many applications naturally require that the number of vertices/members in a community should fall within a certain range. In this paper, we design exact algorithms for the general size-bounded community search problem that aims to find a subgraph with the largest min-degree among all connected subgraphs that contain the query vertex q and have at least l and at most h vertices, where q, l, h are specified by the query. As the problem is NP-hard, we propose a branch-reduce-and-bound algorithm SC-BRB by developing nontrivial reducing techniques, upper bounding techniques, and branching techniques. Experiments on large real graphs show that SC-BRB on average increases the minimum degree of the community returned by the state-of-the-art heuristic algorithm GreedyF by a factor of 2.41 and increases the edge density by a factor of 2.2. In addition, SC-BRB is several orders of magnitude faster than a baseline approach, and all of our proposed techniques contribute to the efficiency of SC-BRB.

2021 ◽  
Vol 14 (11) ◽  
pp. 2006-2018
Author(s):  
Zheng Dong ◽  
Xin Huang ◽  
Guorui Yuan ◽  
Hengshu Zhu ◽  
Hui Xiong

Community search aims at finding densely connected subgraphs for query vertices in a graph. While this task has been studied widely in the literature, most of the existing works only focus on finding homogeneous communities rather than heterogeneous communities with different labels. In this paper, we motivate a new problem of cross-group community search, namely Butterfly-Core Community (BCC), over a labeled graph, where each vertex has a label indicating its properties and an edge between two vertices indicates their cross relationship. Specifically, for two query vertices with different labels, we aim to find a densely connected cross community that contains two query vertices and consists of butterfly networks, where each wing of the butterflies is induced by a k-core search based on one query vertex and two wings are connected by these butterflies. We first develop a heuristic algorithm achieving 2-approximation to the optimal solution. Furthermore, we design fast techniques of query distance computations, leader pair identifications, and index-based BCC local explorations. Extensive experiments on seven real datasets and four useful case studies validate the effectiveness and efficiency of our BCC and its multi-labeled extension models.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-27
Author(s):  
Marco Bressan ◽  
Stefano Leucci ◽  
Alessandro Panconesi

We address the problem of computing the distribution of induced connected subgraphs, aka graphlets or motifs , in large graphs. The current state-of-the-art algorithms estimate the motif counts via uniform sampling by leveraging the color coding technique by Alon, Yuster, and Zwick. In this work, we extend the applicability of this approach by introducing a set of algorithmic optimizations and techniques that reduce the running time and space usage of color coding and improve the accuracy of the counts. To this end, we first show how to optimize color coding to efficiently build a compact table of a representative subsample of all graphlets in the input graph. For 8-node motifs, we can build such a table in one hour for a graph with 65M nodes and 1.8B edges, which is times larger than the state of the art. We then introduce a novel adaptive sampling scheme that breaks the “additive error barrier” of uniform sampling, guaranteeing multiplicative approximations instead of just additive ones. This allows us to count not only the most frequent motifs, but also extremely rare ones. For instance, on one graph we accurately count nearly 10.000 distinct 8-node motifs whose relative frequency is so small that uniform sampling would literally take centuries to find them. Our results show that color coding is still the most promising approach to scalable motif counting.


2021 ◽  
Vol 50 (1) ◽  
pp. 33-40
Author(s):  
Chenhao Ma ◽  
Yixiang Fang ◽  
Reynold Cheng ◽  
Laks V.S. Lakshmanan ◽  
Wenjie Zhang ◽  
...  

Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a threethousand- edge graph, it takes three days for one of the best exact algorithms to complete. In this paper, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop both exact and approximation algorithms. We have performed an extensive evaluation of our approaches on eight real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.


2021 ◽  
Vol 7 (2) ◽  
pp. 21
Author(s):  
Roland Perko ◽  
Manfred Klopschitz ◽  
Alexander Almer ◽  
Peter M. Roth

Many scientific studies deal with person counting and density estimation from single images. Recently, convolutional neural networks (CNNs) have been applied for these tasks. Even though often better results are reported, it is often not clear where the improvements are resulting from, and if the proposed approaches would generalize. Thus, the main goal of this paper was to identify the critical aspects of these tasks and to show how these limit state-of-the-art approaches. Based on these findings, we show how to mitigate these limitations. To this end, we implemented a CNN-based baseline approach, which we extended to deal with identified problems. These include the discovery of bias in the reference data sets, ambiguity in ground truth generation, and mismatching of evaluation metrics w.r.t. the training loss function. The experimental results show that our modifications allow for significantly outperforming the baseline in terms of the accuracy of person counts and density estimation. In this way, we get a deeper understanding of CNN-based person density estimation beyond the network architecture. Furthermore, our insights would allow to advance the field of person density estimation in general by highlighting current limitations in the evaluation protocols.


2020 ◽  
Vol 14 (4) ◽  
pp. 573-585
Author(s):  
Guimu Guo ◽  
Da Yan ◽  
M. Tamer Özsu ◽  
Zhe Jiang ◽  
Jalal Khalil

Given a user-specified minimum degree threshold γ , a γ -quasiclique is a subgraph g = (V g , E g ) where each vertex ν ∈ V g connects to at least γ fraction of the other vertices (i.e., ⌈ γ · (| V g |- 1)⌉ vertices) in g. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasi-cliques on G-thinker, a distributed graph mining framework that decomposes mining into compute-intensive tasks to fully utilize CPU cores. We found that directly using G-thinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning G-thinker's execution engine to prioritize long-running tasks for execution, and by utilizing a novel timeout strategy to effectively decompose long-running tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the state-of-the-art quasi-clique algorithm, Quick, to our redesigned G-thinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16-node cluster.


In this paper we propose a new heuristic algorithm for solving a maximum clique search problem (MCP). While the proposed algorithm (called TrustCLQ) uses a general approach to solving MCP, it is almost independent of the order of vertices and does not exploit a partition of the graph into independent sets. The algorithm was tested on DIMACS library graphs which are often employed for testing MCP solution algorithms. TrustCLQ algorithm was compared with the well-known ILS heuristic algorithm (as well as with a standard algorithm from networkx library) on DIMACS data sets. Moreover, TrustCLQ algorithm has been tested on Facebook social graphs


2021 ◽  
Vol 46 (4) ◽  
pp. 1-45
Author(s):  
Chenhao Ma ◽  
Yixiang Fang ◽  
Reynold Cheng ◽  
Laks V. S. Lakshmanan ◽  
Wenjie Zhang ◽  
...  

Given a directed graph G , the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G , whose density is the highest among all the subgraphs of G . The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a 3,000-edge graph, it takes three days for one of the best exact algorithms to complete. In this article, we develop an efficient and scalable DDS solution. We introduce the notion of [ x , y ]-core, which is a dense subgraph for G , and show that the densest subgraph can be accurately located through the [ x , y ]-core with theoretical guarantees. Based on the [ x , y ]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and we develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on 15 real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.


Author(s):  
Jairo R. Montoya-Torres ◽  
Libardo S. Gómez-Vizcaíno ◽  
Elyn L. Solano-Charris ◽  
Carlos D. Paternina-Arboleda

This paper examines the problem of jobshop scheduling with either makespan minimization or total tardiness minimization, which are both known to be NP-hard. The authors propose the use of a meta-heuristic procedure inspired from bacterial phototaxis. This procedure, called Global Bacteria Optimization (GBO), emulates the reaction of some organisms (bacteria) to light stimulation. Computational experiments are performed using well-known instances from literature. Results show that the algorithm equals and even outperforms previous state-of-the-art procedures in terms of quality of solution and requires very short computational time.


Mathematics ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 1380
Author(s):  
Noelia Rico ◽  
Camino R. Vela ◽  
Raúl Pérez-Fernández ◽  
Irene Díaz

Preference aggregation and in particular ranking aggregation are mainly studied by the field of social choice theory but extensively applied in a variety of contexts. Among the most prominent methods for ranking aggregation, the Kemeny method has been proved to be the only one that satisfies some desirable properties such as neutrality, consistency and the Condorcet condition at the same time. Unfortunately, the problem of finding a Kemeny ranking is NP-hard, which prevents practitioners from using it in real-life problems. The state of the art of exact algorithms for the computation of the Kemeny ranking experienced a major boost last year with the presentation of an algorithm that provides searching time guarantee up to 13 alternatives. In this work, we propose an enhanced version of this algorithm based on pruning the search space when some Condorcet properties hold. This enhanced version greatly improves the performance in terms of runtime consumption.


Sign in / Sign up

Export Citation Format

Share Document