Efficient size-bounded community search over large networks

The problem of community search, which aims to find a cohesive subgraph containing user-given query vertices, has been extensively studied recently. Most of the existing studies mainly focus on the cohesiveness of the returned community, while ignoring the size of the community, and may yield communities of very large sizes. However, many applications naturally require that the number of vertices/members in a community should fall within a certain range. In this paper, we design exact algorithms for the general size-bounded community search problem that aims to find a subgraph with the largest min-degree among all connected subgraphs that contain the query vertex q and have at least l and at most h vertices, where q, l, h are specified by the query. As the problem is NP-hard, we propose a branch-reduce-and-bound algorithm SC-BRB by developing nontrivial reducing techniques, upper bounding techniques, and branching techniques. Experiments on large real graphs show that SC-BRB on average increases the minimum degree of the community returned by the state-of-the-art heuristic algorithm GreedyF by a factor of 2.41 and increases the edge density by a factor of 2.2. In addition, SC-BRB is several orders of magnitude faster than a baseline approach, and all of our proposed techniques contribute to the efficiency of SC-BRB.

Download Full-text

Butterfly-core community search over labeled graphs

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476258 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2006-2018

Author(s):

Zheng Dong ◽

Xin Huang ◽

Guorui Yuan ◽

Hengshu Zhu ◽

Hui Xiong

Keyword(s):

Case Studies ◽

Heuristic Algorithm ◽

Optimal Solution ◽

Labeled Graph ◽

Labeled Graphs ◽

Community Search ◽

Effectiveness And Efficiency ◽

Connected Subgraphs

Community search aims at finding densely connected subgraphs for query vertices in a graph. While this task has been studied widely in the literature, most of the existing works only focus on finding homogeneous communities rather than heterogeneous communities with different labels. In this paper, we motivate a new problem of cross-group community search, namely Butterfly-Core Community (BCC), over a labeled graph, where each vertex has a label indicating its properties and an edge between two vertices indicates their cross relationship. Specifically, for two query vertices with different labels, we aim to find a densely connected cross community that contains two query vertices and consists of butterfly networks, where each wing of the butterflies is induced by a k-core search based on one query vertex and two wings are connected by these butterflies. We first develop a heuristic algorithm achieving 2-approximation to the optimal solution. Furthermore, we design fast techniques of query distance computations, leader pair identifications, and index-based BCC local explorations. Extensive experiments on seven real datasets and four useful case studies validate the effectiveness and efficiency of our BCC and its multi-labeled extension models.

Download Full-text

Faster Motif Counting via Succinct Color Coding and Adaptive Sampling

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3447397 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-27

Author(s):

Marco Bressan ◽

Stefano Leucci ◽

Alessandro Panconesi

Keyword(s):

Adaptive Sampling ◽

Relative Frequency ◽

State Of The Art ◽

Color Coding ◽

Input Graph ◽

Large Graphs ◽

Running Time ◽

Uniform Sampling ◽

Current State ◽

Connected Subgraphs

We address the problem of computing the distribution of induced connected subgraphs, aka graphlets or motifs , in large graphs. The current state-of-the-art algorithms estimate the motif counts via uniform sampling by leveraging the color coding technique by Alon, Yuster, and Zwick. In this work, we extend the applicability of this approach by introducing a set of algorithmic optimizations and techniques that reduce the running time and space usage of color coding and improve the accuracy of the counts. To this end, we first show how to optimize color coding to efficiently build a compact table of a representative subsample of all graphlets in the input graph. For 8-node motifs, we can build such a table in one hour for a graph with 65M nodes and 1.8B edges, which is times larger than the state of the art. We then introduce a novel adaptive sampling scheme that breaks the “additive error barrier” of uniform sampling, guaranteeing multiplicative approximations instead of just additive ones. This allows us to count not only the most frequent motifs, but also extremely rare ones. For instance, on one graph we accurately count nearly 10.000 distinct 8-node motifs whose relative frequency is so small that uniform sampling would literally take centuries to find them. Our results show that color coding is still the most promising approach to scalable motif counting.

Download Full-text

Efficient Directed Densest Subgraph Discovery

ACM SIGMOD Record ◽

10.1145/3471485.3471494 ◽

2021 ◽

Vol 50 (1) ◽

pp. 33-40

Author(s):

Chenhao Ma ◽

Yixiang Fang ◽

Reynold Cheng ◽

Laks V.S. Lakshmanan ◽

Wenjie Zhang ◽

...

Keyword(s):

Approximation Algorithms ◽

State Of The Art ◽

Exact Algorithms ◽

Large Datasets ◽

Dense Subgraph ◽

Extensive Evaluation ◽

Wide Range ◽

Edge Graph ◽

Community Mining ◽

Densest Subgraph

Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a threethousand- edge graph, it takes three days for one of the best exact algorithms to complete. In this paper, we develop an efficient and scalable DDS solution. We introduce the notion of [x, y]-core, which is a dense subgraph for G, and show that the densest subgraph can be accurately located through the [x, y]-core with theoretical guarantees. Based on the [x, y]-core, we develop both exact and approximation algorithms. We have performed an extensive evaluation of our approaches on eight real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.

Download Full-text

Critical Aspects of Person Counting and Density Estimation

Journal of Imaging ◽

10.3390/jimaging7020021 ◽

2021 ◽

Vol 7 (2) ◽

pp. 21

Author(s):

Roland Perko ◽

Manfred Klopschitz ◽

Alexander Almer ◽

Peter M. Roth

Keyword(s):

Density Estimation ◽

Network Architecture ◽

Reference Data ◽

State Of The Art ◽

Limit State ◽

Ground Truth ◽

Data Sets ◽

Ground Truth Generation ◽

Baseline Approach ◽

Critical Aspects

Many scientific studies deal with person counting and density estimation from single images. Recently, convolutional neural networks (CNNs) have been applied for these tasks. Even though often better results are reported, it is often not clear where the improvements are resulting from, and if the proposed approaches would generalize. Thus, the main goal of this paper was to identify the critical aspects of these tasks and to show how these limit state-of-the-art approaches. Based on these findings, we show how to mitigate these limitations. To this end, we implemented a CNN-based baseline approach, which we extended to deal with identified problems. These include the discovery of bias in the reference data sets, ambiguity in ground truth generation, and mismatching of evaluation metrics w.r.t. the training loss function. The experimental results show that our modifications allow for significantly outperforming the baseline in terms of the accuracy of person counts and density estimation. In this way, we get a deeper understanding of CNN-based person density estimation beyond the network architecture. Furthermore, our insights would allow to advance the field of person density estimation in general by highlighting current limitations in the evaluation protocols.

Download Full-text

Scalable mining of maximal quasi-cliques

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436916 ◽

2020 ◽

Vol 14 (4) ◽

pp. 573-585

Author(s):

Guimu Guo ◽

Da Yan ◽

M. Tamer Özsu ◽

Zhe Jiang ◽

Jalal Khalil

Keyword(s):

Graph Mining ◽

State Of The Art ◽

Minimum Degree ◽

The Other ◽

Dense Subgraph ◽

Load Imbalance ◽

Subgraph Mining ◽

Execution Engine ◽

Clique Algorithm ◽

Dense Subgraph Mining

Given a user-specified minimum degree threshold γ , a γ -quasiclique is a subgraph g = (V g , E g ) where each vertex ν ∈ V g connects to at least γ fraction of the other vertices (i.e., ⌈ γ · (| V g |- 1)⌉ vertices) in g. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasi-cliques on G-thinker, a distributed graph mining framework that decomposes mining into compute-intensive tasks to fully utilize CPU cores. We found that directly using G-thinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning G-thinker's execution engine to prioritize long-running tasks for execution, and by utilizing a novel timeout strategy to effectively decompose long-running tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the state-of-the-art quasi-clique algorithm, Quick, to our redesigned G-thinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16-node cluster.

Download Full-text

Testing a Heuristic Algorithm for Finding a Maximum Clique on DIMACS and Facebook Graphs

WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL ◽

10.37394/23203.2020.15.11 ◽

2020 ◽

Vol 15 ◽

Keyword(s):

Heuristic Algorithm ◽

Maximum Clique ◽

Independent Sets ◽

Search Problem ◽

Data Sets ◽

Standard Algorithm ◽

Solution Algorithms ◽

Social Graphs

In this paper we propose a new heuristic algorithm for solving a maximum clique search problem (MCP). While the proposed algorithm (called TrustCLQ) uses a general approach to solving MCP, it is almost independent of the order of vertices and does not exploit a partition of the graph into independent sets. The algorithm was tested on DIMACS library graphs which are often employed for testing MCP solution algorithms. TrustCLQ algorithm was compared with the well-known ILS heuristic algorithm (as well as with a standard algorithm from networkx library) on DIMACS data sets. Moreover, TrustCLQ algorithm has been tested on Facebook social graphs

Download Full-text

On Directed Densest Subgraph Discovery

ACM Transactions on Database Systems ◽

10.1145/3483940 ◽

2021 ◽

Vol 46 (4) ◽

pp. 1-45

Author(s):

Chenhao Ma ◽

Yixiang Fang ◽

Reynold Cheng ◽

Laks V. S. Lakshmanan ◽

Wenjie Zhang ◽

...

Keyword(s):

State Of The Art ◽

Directed Graphs ◽

Exact Algorithms ◽

Large Datasets ◽

Dense Subgraph ◽

Extensive Evaluation ◽

Wide Range ◽

Edge Graph ◽

Community Mining ◽

Densest Subgraph

Given a directed graph G , the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G , whose density is the highest among all the subgraphs of G . The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from efficiency and scalability problems: on a 3,000-edge graph, it takes three days for one of the best exact algorithms to complete. In this article, we develop an efficient and scalable DDS solution. We introduce the notion of [ x , y ]-core, which is a dense subgraph for G , and show that the densest subgraph can be accurately located through the [ x , y ]-core with theoretical guarantees. Based on the [ x , y ]-core, we develop exact and approximation algorithms. We further study the problems of maintaining the DDS over dynamic directed graphs and finding the weighted DDS on weighted directed graphs, and we develop efficient non-trivial algorithms to solve these two problems by extending our DDS algorithms. We have performed an extensive evaluation of our approaches on 15 real large datasets. The results show that our proposed solutions are up to six orders of magnitude faster than the state-of-the-art.

Download Full-text

Global Bacteria Optimization Meta-Heuristic Algorithm for Jobshop Scheduling

International Journal of Operations Research and Information Systems ◽

10.4018/joris.2010100103 ◽

2010 ◽

Vol 1 (4) ◽

pp. 47-58 ◽

Cited By ~ 9

Author(s):

Jairo R. Montoya-Torres ◽

Libardo S. Gómez-Vizcaíno ◽

Elyn L. Solano-Charris ◽

Carlos D. Paternina-Arboleda

Keyword(s):

Heuristic Algorithm ◽

State Of The Art ◽

Total Tardiness ◽

Computational Time ◽

Heuristic Procedure ◽

Makespan Minimization ◽

Light Stimulation ◽

Jobshop Scheduling ◽

Previous State

This paper examines the problem of jobshop scheduling with either makespan minimization or total tardiness minimization, which are both known to be NP-hard. The authors propose the use of a meta-heuristic procedure inspired from bacterial phototaxis. This procedure, called Global Bacteria Optimization (GBO), emulates the reaction of some organisms (bacteria) to light stimulation. Computational experiments are performed using well-known instances from literature. Results show that the algorithm equals and even outperforms previous state-of-the-art procedures in terms of quality of solution and requires very short computational time.

Download Full-text

Reducing the Computational Time for the Kemeny Method by Exploiting Condorcet Properties

Mathematics ◽

10.3390/math9121380 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1380

Author(s):

Noelia Rico ◽

Camino R. Vela ◽

Raúl Pérez-Fernández ◽

Irene Díaz

Keyword(s):

Choice Theory ◽

State Of The Art ◽

Social Choice Theory ◽

Real Life ◽

Search Space ◽

Exact Algorithms ◽

Computational Time ◽

Preference Aggregation ◽

Life Problems ◽

Ranking Aggregation

Preference aggregation and in particular ranking aggregation are mainly studied by the field of social choice theory but extensively applied in a variety of contexts. Among the most prominent methods for ranking aggregation, the Kemeny method has been proved to be the only one that satisfies some desirable properties such as neutrality, consistency and the Condorcet condition at the same time. Unfortunately, the problem of finding a Kemeny ranking is NP-hard, which prevents practitioners from using it in real-life problems. The state of the art of exact algorithms for the computation of the Kemeny ranking experienced a major boost last year with the presentation of an algorithm that provides searching time guarantee up to 13 alternatives. In this work, we propose an enhanced version of this algorithm based on pruning the search space when some Condorcet properties hold. This enhanced version greatly improves the performance in terms of runtime consumption.

Download Full-text

Many disjoint dense subgraphs versus large k-connected subgraphs in large graphs with given edge density

Discrete Mathematics ◽

10.1016/j.disc.2008.01.010 ◽

2009 ◽

Vol 309 (4) ◽

pp. 997-1000 ◽

Cited By ~ 2

Author(s):

Thomas Böhme ◽

Alexandr Kostochka

Keyword(s):

Edge Density ◽

Large Graphs ◽

Dense Subgraphs ◽

Connected Subgraphs

Download Full-text