scholarly journals Label Propagation-Based Parallel Graph Partitioning for Large-Scale Graph Data

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 72801-72813
Author(s):  
Minho Bae ◽  
Minjoong Jeong ◽  
Sangyoon Oh
2014 ◽  
Vol 912-914 ◽  
pp. 1309-1312
Author(s):  
Jin Xu ◽  
Yu Zhong ◽  
Bo Peng

With the emergence of large social networks, such as Facebook and Twitter, graphs with millions to billions vertices are common. Instead of processing the network within a single machine, all the applications related are intended to be done in a distributed way using a cluster of commodity machines. In this paper, we study the parallel graph partitioning problem, which is the fundamental operation for large graphs. With the help of Hadoop/MapReduce, we propose aparallel k-way partitioningapproach. Unlike the previous ones, which require enough memory to keep the whole graph data within, our novel approach breaks such limitations. Also, due to the distributed nature, it is easy to integrate our partitioning approach into existed parallel platforms. We conduct extensive experiments on real graphs and synthetic graphs. All the experimental results prove the effectiveness and efficiency of our approach.


2021 ◽  
Vol 5 (1) ◽  
pp. 14
Author(s):  
Christos Makris ◽  
Georgios Pispirigos

Nowadays, due to the extensive use of information networks in a broad range of fields, e.g., bio-informatics, sociology, digital marketing, computer science, etc., graph theory applications have attracted significant scientific interest. Due to its apparent abstraction, community detection has become one of the most thoroughly studied graph partitioning problems. However, the existing algorithms principally propose iterative solutions of high polynomial order that repetitively require exhaustive analysis. These methods can undoubtedly be considered resource-wise overdemanding, unscalable, and inapplicable in big data graphs, such as today’s social networks. In this article, a novel, near-linear, and highly scalable community prediction methodology is introduced. Specifically, using a distributed, stacking-based model, which is built on plain network topology characteristics of bootstrap sampled subgraphs, the underlined community hierarchy of any given social network is efficiently extracted in spite of its size and density. The effectiveness of the proposed methodology has diligently been examined on numerous real-life social networks and proven superior to various similar approaches in terms of performance, stability, and accuracy.


2021 ◽  
Vol 12 (5) ◽  
pp. 1-25
Author(s):  
Shengwei Ji ◽  
Chenyang Bu ◽  
Lei Li ◽  
Xindong Wu

Graph edge partitioning, which is essential for the efficiency of distributed graph computation systems, divides a graph into several balanced partitions within a given size to minimize the number of vertices to be cut. Existing graph partitioning models can be classified into two categories: offline and streaming graph partitioning models. The former requires global graph information during the partitioning, which is expensive in terms of time and memory for large-scale graphs. The latter creates partitions based solely on the received graph information. However, the streaming model may result in a lower partitioning quality compared with the offline model. Therefore, this study introduces a Local Graph Edge Partitioning model, which considers only the local information (i.e., a portion of a graph instead of the entire graph) during the partitioning. Considering only the local graph information is meaningful because acquiring complete information for large-scale graphs is expensive. Based on the Local Graph Edge Partitioning model, two local graph edge partitioning algorithms—Two-stage Local Partitioning and Adaptive Local Partitioning—are given. Experimental results obtained on 14 real-world graphs demonstrate that the proposed algorithms outperform rival algorithms in most tested cases. Furthermore, the proposed algorithms are proven to significantly improve the efficiency of the real graph computation system GraphX.


Author(s):  
Bao Bing-Kun ◽  
Yan Shuicheng

Graph-based learning provides a useful approach for modeling data in image annotation problems. In this chapter, the authors introduce how to construct a region-based graph to annotate large scale multi-label images. It has been well recognized that analysis in semantic region level may greatly improve image annotation performance compared to that in whole image level. However, the region level approach increases the data scale to several orders of magnitude and lays down new challenges to most existing algorithms. To this end, each image is firstly encoded as a Bag-of-Regions based on multiple image segmentations. And then, all image regions are constructed into a large k-nearest-neighbor graph with efficient Locality Sensitive Hashing (LSH) method. At last, a sparse and region-aware image-based graph is fed into the multi-label extension of the Entropic graph regularized semi-supervised learning algorithm (Subramanya & Bilmes, 2009). In combination they naturally yield the capability in handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets well validate the effectiveness and efficiency of the framework for region-aware and scalable multi-label propagation.


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Antonio Maria Fiscarelli ◽  
Matthias R. Brust ◽  
Grégoire Danoy ◽  
Pascal Bouvry

Abstract The objective of a community detection algorithm is to group similar nodes that are more connected to each other than with the rest of the network. Several methods have been proposed but many are of high complexity and require global knowledge of the network, which makes them less suitable for large-scale networks. The Label Propagation Algorithm initially assigns a distinct label to each node that iteratively updates its label with the one of the majority of its neighbors, until consensus is reached among all nodes in the network. Nodes sharing the same label are then grouped into communities. It runs in near linear time and is decentralized, but it gets easily stuck in local optima and often returns a single giant community. To overcome these problems we propose MemLPA, a variation of the classical Label Propagation Algorithm where each node implements a memory mechanism that allows them to “remember” about past states of the network and uses a decision rule that takes this information into account. We demonstrate through extensive experiments, on the Lancichinetti-Fortunato-Radicchi benchmark and a set of real-world networks, that MemLPA outperforms other existing label propagation algorithms that implement memory and some of the well-known community detection algorithms. We also perform a topological analysis to extend the performance study and compare the topological properties of the communities found to the ground-truth community structure.


2019 ◽  
Vol 33 (30) ◽  
pp. 1950363
Author(s):  
Chen Song ◽  
Guoyan Huang ◽  
Bo Yin ◽  
Bing Zhang ◽  
Xinqian Liu

Label propagation algorithm (LPA) attracts wide attention in community detection field for its near linear time complexity in large scale network. However, the algorithm adopts a random selection scheme in label updating strategy, which results in unstable division and poor accuracy. In this paper, five different indicators of node similarity are introduced based on network local information to distinguish nodes and a new label updating method is proposed. When there are multiple maximum neighbor labels in the propagation process, the maximum label corresponding to the most similar node is selected for updating instead of a random one. Five different forms of improved LPA are proposed which are named as SAL-LPA, SOR-LPA, JAC-LPA, SOR-LPA, HDI-LPA and HPI-LPA. The experiment results on real-world and artificial benchmark networks show that the improved LPA greatly improves the performance of the original algorithm, among which HPI-LPA is the best.


Author(s):  
Keita Iwabuchi ◽  
Scott Sallinen ◽  
Roger Pearce ◽  
Brian Van Essen ◽  
Maya Gokhale ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document