massive graphs Latest Research Papers

Cohesive subgraph identification is a fundamental problem in bipartite graph analysis. In real applications, to better represent the co-relationship between entities, edges are usually associated with weights or frequencies, which are neglected by most existing research. To fill the gap, we propose a new cohesive subgraph model, (k,ω)-core, by considering both subgraph cohesiveness and frequency for weighted bipartite graphs. Specifically, (k,ω)-core requires each node on the left layer to have at least k neighbors (cohesiveness) and each node on the right layer to have a weight of at least ω (frequency). In real scenarios, different users may have different parameter requirements. To handle massive graphs and queries, index-based strategies are developed. In addition, effective optimization techniques are proposed to improve the index construction phase. Compared with the baseline, extensive experiments on six datasets validate the superiority of our proposed methods.

Download Full-text

Towards Computing a Near-Maximum Weighted Independent Set on Massive Graphs

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining ◽

10.1145/3447548.3467232 ◽

2021 ◽

Author(s):

Jiewei Gu ◽

Weiguo Zheng ◽

Yuzheng Cai ◽

Peng Peng

Keyword(s):

Independent Set ◽

Massive Graphs

Download Full-text

Local clustering via approximate heat kernel PageRank with subgraph sampling

Scientific Reports ◽

10.1038/s41598-021-95250-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Zhenqi Lu ◽

Johan Wahlström ◽

Arye Nehorai

Keyword(s):

Heat Kernel ◽

Clustering Algorithm ◽

State Of The Art ◽

Graph Clustering ◽

Approximate Algorithms ◽

Local Cluster ◽

Massive Graphs ◽

Approximation Errors ◽

Local Clustering ◽

Large Systems

AbstractGraph clustering, a fundamental technique in network science for understanding structures in complex systems, presents inherent problems. Though studied extensively in the literature, graph clustering in large systems remains particularly challenging because massive graphs incur a prohibitively large computational load. The heat kernel PageRank provides a quantitative ranking of nodes, and a local cluster can be efficiently found by performing a sweep over the heat kernel PageRank vector. But computing an exact heat kernel PageRank vector may be expensive, and approximate algorithms are often used instead. Most approximate algorithms compute the heat kernel PageRank vector on the whole graph, and thus are dependent on global structures. In this paper, we present an algorithm for approximating the heat kernel PageRank on a local subgraph. Moreover, we show that the number of computations required by the proposed algorithm is sublinear in terms of the expected size of the local cluster of interest, and that it provides a good approximation of the heat kernel PageRank, with approximation errors bounded by a probabilistic guarantee. Numerical experiments verify that the local clustering algorithm using our approximate heat kernel PageRank achieves state-of-the-art performance.

Download Full-text

A New Upper Bound Based on Vertex Partitioning for the Maximum K-plex Problem

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/233 ◽

2021 ◽

Author(s):

Hua Jiang ◽

Dongming Zhu ◽

Zhichao Xie ◽

Shaowen Yao ◽

Zhang-Hua Fu

Keyword(s):

Branch And Bound ◽

Upper Bound ◽

Undirected Graph ◽

State Of The Art ◽

Search Space ◽

Exact Algorithms ◽

Induced Subgraph ◽

Massive Graphs ◽

Vertex Set ◽

Number Of Branches

Given an undirected graph, the Maximum k-plex Problem (MKP) is to find a largest induced subgraph in which each vertex has at most k−1 non-adjacent vertices. The problem arises in social network analysis and has found applications in many important areas employing graph-based data mining. Existing exact algorithms usually implement a branch-and-bound approach that requires a tight upper bound to reduce the search space. In this paper, we propose a new upper bound for MKP, which is a partitioning of the candidate vertex set with respect to the constructing solution. We implement a new branch-and-bound algorithm that employs the upper bound to reduce the number of branches. Experimental results show that the upper bound is very effective in reducing the search space. The new algorithm outperforms the state-of-the-art algorithms significantly on real-world massive graphs, DIMACS graphs and random graphs.

Download Full-text

Tiered Sampling

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441299 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-52

Author(s):

Lorenzo De Stefani ◽

Erisa Terolli ◽

Eli Upfal

Keyword(s):

Large Scale ◽

Analysis Of Algorithms ◽

Base Layer ◽

Single Edge ◽

Real World Data ◽

High Quality ◽

Large Graphs ◽

Massive Graphs ◽

Variance Estimate ◽

Low Probability

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.

Download Full-text