Lightweight Label Propagation for Large-Scale Network Data

Label propagation spreads the soft labels from few labeled data to a large amount of unlabeled data according to the intrinsic graph structure. Nonetheless, most label propagation solutions work under relatively small-scale data and fail to cope with many real applications, such as social network analysis, where graphs usually have millions of nodes. In this paper, we propose a novel algorithm named \algo to deal with large-scale data. A lightweight iterative process derived from the well-known stochastic gradient descent strategy is used to reduce memory overhead and accelerate the solving process. We also give a theoretical analysis on the necessity of the warm-start technique for label propagation. Experiments show that our algorithm can handle million-scale graphs in few seconds while achieving highly competitive performance with existing algorithms.

Download Full-text

Event Prediction Based On Large Scale Network Subgraph Convolution

10.21203/rs.3.rs-622956/v1 ◽

2021 ◽

Author(s):

XiaoWei Wu ◽

FanLiang Bu ◽

ZhiWen Hou

Keyword(s):

Euclidean Distance ◽

Large Scale ◽

Graph Embedding ◽

Large Scale Data ◽

Large Scale Network ◽

Event Prediction ◽

Convolution Algorithm ◽

Scale Network ◽

Data Graph ◽

Scale Data

Abstract Aiming at the problem of event prediction in large-scale event network, a collapse subgraph convolution (CSGCN) algorithm is proposed, which uses event subgraph to predict the subsequent events of event group. CSGCN algorithm collapses the edge induced event subgraph in large-scale event network, removes the irrelevant event nodes from the subgraph, and forms a new event subgraph. GCN algorithm is used to learn the graph embedding representation of the event subgraph, and the subsequent events of the event group are predicted by comparing the similarity between the graph embedding representation of the event group and the subsequent events. Because only some related nodes are processed each time, the application of the model in large-scale data graph is feasible. Through experiments, we explore and verify the effectiveness of extracting features from subgraphs of large-scale graph by using graph convolution training to obtain graph embedding representation. We find that GCN has better event prediction effect than Euclidean distance and co rotation similarity, which further shows that graph convolution algorithm has good performance in the field of graph feature extraction.

Download Full-text

Label propagation algorithm based on node similarity driven by local information

International Journal of Modern Physics B ◽

10.1142/s0217979219503636 ◽

2019 ◽

Vol 33 (30) ◽

pp. 1950363

Author(s):

Chen Song ◽

Guoyan Huang ◽

Bo Yin ◽

Bing Zhang ◽

Xinqian Liu

Keyword(s):

Large Scale ◽

Linear Time ◽

Local Information ◽

Label Propagation ◽

Selection Scheme ◽

Original Algorithm ◽

Large Scale Network ◽

Propagation Algorithm ◽

Node Similarity ◽

Scale Network

Label propagation algorithm (LPA) attracts wide attention in community detection field for its near linear time complexity in large scale network. However, the algorithm adopts a random selection scheme in label updating strategy, which results in unstable division and poor accuracy. In this paper, five different indicators of node similarity are introduced based on network local information to distinguish nodes and a new label updating method is proposed. When there are multiple maximum neighbor labels in the propagation process, the maximum label corresponding to the most similar node is selected for updating instead of a random one. Five different forms of improved LPA are proposed which are named as SAL-LPA, SOR-LPA, JAC-LPA, SOR-LPA, HDI-LPA and HPI-LPA. The experiment results on real-world and artificial benchmark networks show that the improved LPA greatly improves the performance of the original algorithm, among which HPI-LPA is the best.

Download Full-text

Small-scale microwave background anisotropies implied by large-scale data

The Astrophysical Journal ◽

10.1086/172140 ◽

1993 ◽

Vol 402 ◽

pp. 369

Author(s):

A. Kashlinsky

Keyword(s):

Large Scale ◽

Small Scale ◽

Microwave Background ◽

Large Scale Data ◽

Scale Data ◽

Microwave Background Anisotropies

Download Full-text

Developing a ‘Semi-Systematic’ Approach to Using Large-Scale Data-Sets for Small-Scale Interventions: The ‘Baby Matterz’ Initiative as a Case Study

The Urban Review ◽

10.1007/s11256-009-0144-z ◽

2010 ◽

Vol 43 (2) ◽

pp. 235-254

Author(s):

Mark O’Brien

Keyword(s):

Large Scale ◽

Systematic Approach ◽

Small Scale ◽

Data Sets ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

A Hierarchical and Abstraction-Based Blockchain Model

Applied Sciences ◽

10.3390/app9112343 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2343 ◽

Cited By ~ 3

Author(s):

Swagatika Sahoo ◽

Akshay M. Fajge ◽

Raju Halder ◽

Agostino Cortesi

Keyword(s):

Large Scale ◽

Abstract Interpretation ◽

Organizational Structures ◽

Network Size ◽

Small Scale ◽

Time Interval ◽

Huge Number ◽

Large Scale Network ◽

Regular Time ◽

Scale Network

In the nine years since its launch, amid intense research, scalability is always a serious concern in blockchain, especially in case of large-scale network generating huge number of transaction-records. In this paper, we propose a hierarchical blockchain model characterized by: (1) each level maintains multiple local blockchain networks, (2) each local blockchain records local transactional activities, and (3) partial views (tunable w.r.t. precision) of different subsets of local blockchain-records are maintained in the blockchains at next level of the hierarchy. To meet this objective, we apply abstractions on a set of transaction-records in a regular time interval by following the Abstract Interpretation framework, which provides a tunable precision in various abstract domain and guarantees the soundness of the system. While this model suitably fits to the real-worlds organizational structures, the proposal is powerful enough to scale when large number of nodes participate in a network resulting into an enormous growth of the network-size and the number of transaction-records. We discuss experimental results on a small-scale network with three sub networks at lower-level and by abstracting the transaction-records in the abstract domain of intervals. The results are encouraging and clearly indicate the effectiveness of this approach to control exponential growth of blockchain size w.r.t. the total number of participants in the network.

Download Full-text

A Topology Visualization Early Warning Distribution Algorithm for Large-Scale Network Security Incidents

The Scientific World JOURNAL ◽

10.1155/2013/827376 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7

Author(s):

Hui He ◽

Guotao Fan ◽

Jianwei Ye ◽

Weizhe Zhang

Keyword(s):

Network Security ◽

Network Topology ◽

Early Warning ◽

Early Warning System ◽

Large Scale ◽

Warning System ◽

Small Scale ◽

Large Scale Network ◽

Scale Network ◽

Security Incidents

It is of great significance to research the early warning system for large-scale network security incidents. It can improve the network system’s emergency response capabilities, alleviate the cyber attacks’ damage, and strengthen the system’s counterattack ability. A comprehensive early warning system is presented in this paper, which combines active measurement and anomaly detection. The key visualization algorithm and technology of the system are mainly discussed. The large-scale network system’s plane visualization is realized based on the divide and conquer thought. First, the topology of the large-scale network is divided into some small-scale networks by the MLkP/CR algorithm. Second, the sub graph plane visualization algorithm is applied to each small-scale network. Finally, the small-scale networks’ topologies are combined into a topology based on the automatic distribution algorithm of force analysis. As the algorithm transforms the large-scale network topology plane visualization problem into a series of small-scale network topology plane visualization and distribution problems, it has higher parallelism and is able to handle the display of ultra-large-scale network topology.

Download Full-text

Stochastic Gradient Descent Based K-Means Algorithm on Large Scale Data Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1342 ◽

2014 ◽

Vol 687-691 ◽

pp. 1342-1345 ◽

Cited By ~ 1

Author(s):

Jie Ding ◽

Li Peng Zhu ◽

Bin Hu ◽

Ren Long Hang ◽

Yu Bao Sun

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Clustering Algorithm ◽

Distance Matrix ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Data Sets ◽

Human Beings ◽

Large Scale Data ◽

Scale Data

With the rapid advance of data collection and storage technique, it is easy to acquire tens of millions or even billions of data sets. How to explore and exploit the useful or interesting information for human beings from these data sets has become an urgent issue. Traditional k-means clustering algorithm has been widely used in data mining community. First, randomly initialize k clustering centres. Then, all instances are classified into k different classes according to their distances to clustering centres. Lastly, update the clustering centres by the mean of its corresponding constituent instances. This whole process will be iterated until convergence. Obviously, at each iteration, distance matrix from all instances to k clustering centres must be calculated which will cost so much time when encounter large scale data sets. To address this issue, in this paper, we proposed a fast optimization algorithm based on stochastic gradient descent (SGD). At each iteration, randomly choose an instance, search its corresponding clustering centre and then update it immediately. Experimental results show that our proposed method achieves a competitive clustering results with less time cost.

Download Full-text