Label Propagation-Based Parallel Graph Partitioning for Large-Scale Graph Data

With the emergence of large social networks, such as Facebook and Twitter, graphs with millions to billions vertices are common. Instead of processing the network within a single machine, all the applications related are intended to be done in a distributed way using a cluster of commodity machines. In this paper, we study the parallel graph partitioning problem, which is the fundamental operation for large graphs. With the help of Hadoop/MapReduce, we propose aparallel k-way partitioningapproach. Unlike the previous ones, which require enough memory to keep the whole graph data within, our novel approach breaks such limitations. Also, due to the distributed nature, it is easy to integrate our partitioning approach into existed parallel platforms. We conduct extensive experiments on real graphs and synthetic graphs. All the experimental results prove the effectiveness and efficiency of our approach.

Download Full-text

Stacked Community Prediction: A Distributed Stacking-Based Community Extraction Methodology for Large Scale Social Networks

Big Data and Cognitive Computing ◽

10.3390/bdcc5010014 ◽

2021 ◽

Vol 5 (1) ◽

pp. 14

Author(s):

Christos Makris ◽

Georgios Pispirigos

Keyword(s):

Social Networks ◽

Graph Partitioning ◽

Large Scale ◽

Real Life ◽

Information Networks ◽

Digital Marketing ◽

Partitioning Problems ◽

Iterative Solutions ◽

Community Extraction ◽

Stability And Accuracy

Nowadays, due to the extensive use of information networks in a broad range of fields, e.g., bio-informatics, sociology, digital marketing, computer science, etc., graph theory applications have attracted significant scientific interest. Due to its apparent abstraction, community detection has become one of the most thoroughly studied graph partitioning problems. However, the existing algorithms principally propose iterative solutions of high polynomial order that repetitively require exhaustive analysis. These methods can undoubtedly be considered resource-wise overdemanding, unscalable, and inapplicable in big data graphs, such as today’s social networks. In this article, a novel, near-linear, and highly scalable community prediction methodology is introduced. Specifically, using a distributed, stacking-based model, which is built on plain network topology characteristics of bootstrap sampled subgraphs, the underlined community hierarchy of any given social network is efficiently extracted in spite of its size and density. The effectiveness of the proposed methodology has diligently been examined on numerous real-life social networks and proven superior to various similar approaches in terms of performance, stability, and accuracy.

Download Full-text

Parallel Graph Partitioning on Multicore Architectures

Languages and Compilers for Parallel Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-19595-2_17 ◽

2011 ◽

pp. 246-260 ◽

Cited By ~ 11

Author(s):

Xin Sui ◽

Donald Nguyen ◽

Martin Burtscher ◽

Keshav Pingali

Keyword(s):

Graph Partitioning ◽

Multicore Architectures ◽

Parallel Graph

Download Full-text

Marbor: A novel large-scale graph data storage and processing framework

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2014.7017031 ◽

2014 ◽

Author(s):

Wei Zhou ◽

Yun Gao ◽

Jizhong Han ◽

Zhiyong Xu

Keyword(s):

Data Storage ◽

Large Scale ◽

Graph Data ◽

Processing Framework

Download Full-text

Local Graph Edge Partitioning

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3466685 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1-25

Author(s):

Shengwei Ji ◽

Chenyang Bu ◽

Lei Li ◽

Xindong Wu

Keyword(s):

Real World ◽

Graph Partitioning ◽

Large Scale ◽

Complete Information ◽

Local Information ◽

Experimental Results ◽

Two Stage ◽

Graph Computation ◽

Local Graph ◽

Edge Partitioning

Graph edge partitioning, which is essential for the efficiency of distributed graph computation systems, divides a graph into several balanced partitions within a given size to minimize the number of vertices to be cut. Existing graph partitioning models can be classified into two categories: offline and streaming graph partitioning models. The former requires global graph information during the partitioning, which is expensive in terms of time and memory for large-scale graphs. The latter creates partitions based solely on the received graph information. However, the streaming model may result in a lower partitioning quality compared with the offline model. Therefore, this study introduces a Local Graph Edge Partitioning model, which considers only the local information (i.e., a portion of a graph instead of the entire graph) during the partitioning. Considering only the local graph information is meaningful because acquiring complete information for large-scale graphs is expensive. Based on the Local Graph Edge Partitioning model, two local graph edge partitioning algorithms—Two-stage Local Partitioning and Adaptive Local Partitioning—are given. Experimental results obtained on 14 real-world graphs demonstrate that the proposed algorithms outperform rival algorithms in most tested cases. Furthermore, the proposed algorithms are proven to significantly improve the efficiency of the real graph computation system GraphX.

Download Full-text

Region-Based Graph Learning towards Large Scale Image Annotation

Graph-Based Methods in Computer Vision ◽

10.4018/978-1-4666-1891-6.ch013 ◽

2012 ◽

pp. 244-260

Author(s):

Bao Bing-Kun ◽

Yan Shuicheng

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Image Annotation ◽

Learning Algorithm ◽

Label Propagation ◽

Locality Sensitive Hashing ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph ◽

Modeling Data

Graph-based learning provides a useful approach for modeling data in image annotation problems. In this chapter, the authors introduce how to construct a region-based graph to annotate large scale multi-label images. It has been well recognized that analysis in semantic region level may greatly improve image annotation performance compared to that in whole image level. However, the region level approach increases the data scale to several orders of magnitude and lays down new challenges to most existing algorithms. To this end, each image is firstly encoded as a Bag-of-Regions based on multiple image segmentations. And then, all image regions are constructed into a large k-nearest-neighbor graph with efficient Locality Sensitive Hashing (LSH) method. At last, a sparse and region-aware image-based graph is fed into the multi-label extension of the Entropic graph regularized semi-supervised learning algorithm (Subramanya & Bilmes, 2009). In combination they naturally yield the capability in handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets well validate the effectiveness and efficiency of the framework for region-aware and scalable multi-label propagation.

Download Full-text

Local memory boosts label propagation for community detection

Applied Network Science ◽

10.1007/s41109-019-0210-8 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 1

Author(s):

Antonio Maria Fiscarelli ◽

Matthias R. Brust ◽

Grégoire Danoy ◽

Pascal Bouvry

Keyword(s):

Community Detection ◽

Large Scale ◽

Linear Time ◽

Topological Analysis ◽

Detection Algorithm ◽

Label Propagation ◽

Performance Study ◽

Global Knowledge ◽

Local Optima ◽

Propagation Algorithm

Abstract The objective of a community detection algorithm is to group similar nodes that are more connected to each other than with the rest of the network. Several methods have been proposed but many are of high complexity and require global knowledge of the network, which makes them less suitable for large-scale networks. The Label Propagation Algorithm initially assigns a distinct label to each node that iteratively updates its label with the one of the majority of its neighbors, until consensus is reached among all nodes in the network. Nodes sharing the same label are then grouped into communities. It runs in near linear time and is decentralized, but it gets easily stuck in local optima and often returns a single giant community. To overcome these problems we propose MemLPA, a variation of the classical Label Propagation Algorithm where each node implements a memory mechanism that allows them to “remember” about past states of the network and uses a decision rule that takes this information into account. We demonstrate through extensive experiments, on the Lancichinetti-Fortunato-Radicchi benchmark and a set of real-world networks, that MemLPA outperforms other existing label propagation algorithms that implement memory and some of the well-known community detection algorithms. We also perform a topological analysis to extend the performance study and compare the topological properties of the communities found to the ground-truth community structure.

Download Full-text

Label propagation algorithm based on node similarity driven by local information

International Journal of Modern Physics B ◽

10.1142/s0217979219503636 ◽

2019 ◽

Vol 33 (30) ◽

pp. 1950363

Author(s):

Chen Song ◽

Guoyan Huang ◽

Bo Yin ◽

Bing Zhang ◽

Xinqian Liu

Keyword(s):

Large Scale ◽

Linear Time ◽

Local Information ◽

Label Propagation ◽

Selection Scheme ◽

Original Algorithm ◽

Large Scale Network ◽

Propagation Algorithm ◽

Node Similarity ◽

Scale Network

Label propagation algorithm (LPA) attracts wide attention in community detection field for its near linear time complexity in large scale network. However, the algorithm adopts a random selection scheme in label updating strategy, which results in unstable division and poor accuracy. In this paper, five different indicators of node similarity are introduced based on network local information to distinguish nodes and a new label updating method is proposed. When there are multiple maximum neighbor labels in the propagation process, the maximum label corresponding to the most similar node is selected for updating instead of a random one. Five different forms of improved LPA are proposed which are named as SAL-LPA, SOR-LPA, JAC-LPA, SOR-LPA, HDI-LPA and HPI-LPA. The experiment results on real-world and artificial benchmark networks show that the improved LPA greatly improves the performance of the original algorithm, among which HPI-LPA is the best.

Download Full-text