Fast Subgraph Matching on Large Graphs using Graphics Processors

Author(s):  
Ha-Nguyen Tran ◽  
Jung-jae Kim ◽  
Bingsheng He
2013 ◽  
Vol E96.D (3) ◽  
pp. 624-633
Author(s):  
Song-Hyon KIM ◽  
Inchul SONG ◽  
Kyong-Ha LEE ◽  
Yoon-Joon LEE

Author(s):  
Wei Chen ◽  
Jia Liu ◽  
Ziyang Chen ◽  
Xian Tang ◽  
Kaiyu Li

Top-K subgraph matching is one of the hot research issues in graph data management, which is to find, from the data graph, K subgraphs isomorphic to the query graph with the largest sum of weights. The existing methods of Top-K subgraph matching on large graphs usually use the filter-and-verify strategy. However, they all suffer from inefficiency in both stages. In the filtering stage, there exists repeated enumeration of vertices and the excessive memory cost of the filtering. In the verification stage, there exists redundant verification. Regarding to the above problems, we propose to use the preprocessing of the graph compression based on equivalent vertices to reduce the enumeration. In the filtering stage, we propose to reduce the memory cost by only considering the direct neighbors. In the verification stage, we take the vertex with the minimum number of candidate vertices in the query graph as the start vertex of the matching order, and use the idea of Ranking While Matching (RWM) to terminate the execution of the algorithm as early as possible by estimating the upper bound of the weights, so as to reduce redundant verification and improve the overall performance. Finally, the experimental results show that our method is much more efficient than existing methods in compression and the processing time.


Author(s):  
Alfredo Ferro ◽  
Rosalba Giugno ◽  
Alfredo Pulvirenti ◽  
Dennis Shasha

From biochemical applications to social networks, graphs represent data. Comparing graphs or searching for motifs on such data often reveals interesting and useful patterns. Most of the problems on graphs are known to be NP-complete. Because of the computational complexity of subgraph matching, reducing the candidate graphs or restricting the space in which to search for motifs is critical to achieving efficiency. Therefore, to optimize and engineer isomorphism algorithms, design indexing and suitable search methods for large graphs are the main directions investigated in the graph searching area. This chapter focuses on the key concepts underlying the existing algorithms. First it reviews the most known used algorithms to compare two algorithms and then it describes the algorithms to search on large graphs making emphasis on their application on biological area.


Author(s):  
Shuai Mu ◽  
Chenxi Wang ◽  
Ming Liu ◽  
Dongdong Li ◽  
Maohua Zhu ◽  
...  

2021 ◽  
Vol 15 (4) ◽  
Author(s):  
Yun Peng ◽  
Xin Lin ◽  
Byron Choi ◽  
Bingsheng He

2021 ◽  
Vol 15 (6) ◽  
pp. 1-27
Author(s):  
Marco Bressan ◽  
Stefano Leucci ◽  
Alessandro Panconesi

We address the problem of computing the distribution of induced connected subgraphs, aka graphlets or motifs , in large graphs. The current state-of-the-art algorithms estimate the motif counts via uniform sampling by leveraging the color coding technique by Alon, Yuster, and Zwick. In this work, we extend the applicability of this approach by introducing a set of algorithmic optimizations and techniques that reduce the running time and space usage of color coding and improve the accuracy of the counts. To this end, we first show how to optimize color coding to efficiently build a compact table of a representative subsample of all graphlets in the input graph. For 8-node motifs, we can build such a table in one hour for a graph with 65M nodes and 1.8B edges, which is times larger than the state of the art. We then introduce a novel adaptive sampling scheme that breaks the “additive error barrier” of uniform sampling, guaranteeing multiplicative approximations instead of just additive ones. This allows us to count not only the most frequent motifs, but also extremely rare ones. For instance, on one graph we accurately count nearly 10.000 distinct 8-node motifs whose relative frequency is so small that uniform sampling would literally take centuries to find them. Our results show that color coding is still the most promising approach to scalable motif counting.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


Sign in / Sign up

Export Citation Format

Share Document