Fair Graph Mining

Given a user-specified minimum degree threshold γ , a γ -quasiclique is a subgraph g = (V g , E g ) where each vertex ν ∈ V g connects to at least γ fraction of the other vertices (i.e., ⌈ γ · (| V g |- 1)⌉ vertices) in g. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasi-cliques on G-thinker, a distributed graph mining framework that decomposes mining into compute-intensive tasks to fully utilize CPU cores. We found that directly using G-thinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning G-thinker's execution engine to prioritize long-running tasks for execution, and by utilizing a novel timeout strategy to effectively decompose long-running tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the state-of-the-art quasi-clique algorithm, Quick, to our redesigned G-thinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16-node cluster.

Download Full-text

VHINFGM: Virus-Host Interaction prediction via Network Fusion and Graph Mining

10.1109/bibm52615.2021.9669642 ◽

2021 ◽

Author(s):

Qiang Zhu ◽

Qinghui Dai ◽

Bangchao Wang ◽

Jinxing Liang ◽

Junping Liu ◽

...

Keyword(s):

Graph Mining ◽

Interaction Prediction ◽

Host Interaction

Download Full-text

A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications - BigMine '12 ◽

10.1145/2351316.2351325 ◽

2012 ◽

Cited By ~ 4

Author(s):

ZengFeng Zeng ◽

Bin Wu ◽

Haoyu Wang

Keyword(s):

Graph Partitioning ◽

Graph Mining ◽

Large Scale ◽

Speed Up ◽

Partitioning Algorithm ◽

Parallel Graph

Download Full-text

Graph Mining on Streams

Encyclopedia of Database Systems ◽

10.1007/978-0-387-39940-9_184 ◽

2009 ◽

pp. 1271-1275 ◽

Cited By ~ 8

Author(s):

Andrew McGregor

Keyword(s):

Graph Mining

Download Full-text

Introduction to Parallel Graph Mining

Practical Graph Mining with R ◽

10.1201/b15352-17 ◽

2013 ◽

pp. 441-488

Keyword(s):

Graph Mining ◽

Parallel Graph

Download Full-text

A qualitative survey on frequent subgraph mining

Open Computer Science ◽

10.1515/comp-2018-0018 ◽

2018 ◽

Vol 8 (1) ◽

pp. 194-209 ◽

Cited By ~ 1

Author(s):

Büsra Güvenoglu ◽

Belgin Ergenç Bostanoglu

Keyword(s):

Data Mining ◽

Graph Mining ◽

Research Area ◽

Heterogeneous Data ◽

Graph Representation ◽

Frequent Subgraph Mining ◽

Subgraph Mining ◽

Frequent Subgraph ◽

Input Type ◽

Frequent Subgraphs

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.

Download Full-text

Dynamic Graph Mining for Multi-weight Multi-destination Route Planning with Deadlines Constraints

Discovering configuration templates of virtualized tenant networks in multi-tenancy datacenters via graph-mining

GiRaF: robust, computational identification of influenza reassortments via graph mining

Graph mining assisted semi-supervised learning for fraudulent cash-out detection

Scalable mining of maximal quasi-cliques

VHINFGM: Virus-Host Interaction prediction via Network Fusion and Graph Mining

A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining

Graph Mining on Streams

Introduction to Parallel Graph Mining

A qualitative survey on frequent subgraph mining

Export Citation Format