Spectral Clustering in Heterogeneous Information Networks

A heterogeneous information network (HIN) is one whose objects are of different types and links between objects could model different object relations. We study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures.

Download Full-text

A pareto ensemble based spectral clustering framework

Complex & Intelligent Systems ◽

10.1007/s40747-020-00215-7 ◽

2020 ◽

Author(s):

Juanjuan Luo ◽

Huadong Ma ◽

Dongqing Zhou

Keyword(s):

Phase I ◽

Phase Ii ◽

Spectral Clustering ◽

Clustering Algorithms ◽

Divide And Conquer ◽

Nonzero Entry ◽

Similarity Matrix ◽

Diversity Preservation ◽

Two Phases ◽

Matrix Construction

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.

Download Full-text

Spectral Clustering Algorithm: MATLAB PCT-Based Parallel Design and Implementation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.580 ◽

2013 ◽

Vol 765-767 ◽

pp. 580-584

Author(s):

Yu Yang ◽

Cheng Gui Zhao

Keyword(s):

Spectral Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Parallel Structure ◽

Computational Time ◽

Similarity Matrix ◽

Data Intensive ◽

Clustering Quality ◽

Spectral Clustering Algorithm

Spectral clustering algorithms inevitable exist computational time and memory use problems for large-scale spectral clustering, owing to compute-intensive and data-intensive. We analyse the time complexity of constructing similarity matrix, doing eigendecomposition and performing k-means and exploiting SPMD parallel structure supported by MATLAB Parallel Computing Toolbox (PCT) to decrease eigendecomposition computational time. We propose using MATLAB Distributed Computing Server to parallel construct similarity matrix, whilst using t-nearest neighbors approach to reduce memory use. Ultimately, we present clustering time, clustering quality and clustering accuracy in the experiments.

Download Full-text

HeteClass: A Meta-path based framework for transductive classification of objects in heterogeneous information networks

Expert Systems with Applications ◽

10.1016/j.eswa.2016.10.013 ◽

2017 ◽

Vol 68 ◽

pp. 106-122 ◽

Cited By ~ 17

Author(s):

Mukul Gupta ◽

Pradeep Kumar ◽

Bharat Bhasker

Keyword(s):

Information Networks ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

Meta Path

Download Full-text

CHIN: Classification with META-PATH in Heterogeneous Information Networks

Communications in Computer and Information Science - Applied Informatics ◽

10.1007/978-3-030-01535-0_5 ◽

2018 ◽

pp. 63-74 ◽

Cited By ~ 1

Author(s):

Jinli Zhang ◽

Zongli Jiang ◽

Tong Li

Keyword(s):

Information Networks ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

Meta Path

Download Full-text

Meta path-based collective classification in heterogeneous information networks

Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12 ◽

10.1145/2396761.2398474 ◽

2012 ◽

Cited By ~ 54

Author(s):

Xiangnan Kong ◽

Philip S. Yu ◽

Ying Ding ◽

David J. Wild

Keyword(s):

Information Networks ◽

Collective Classification ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

Meta Path

Download Full-text

A k-NN-Based Approach Using MapReduce for Meta-path Classification in Heterogeneous Information Networks

Soft Computing in Data Analytics - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-13-0514-6_28 ◽

2018 ◽

pp. 277-284 ◽

Cited By ~ 2

Author(s):

Sadhana Kodali ◽

Madhavi Dabbiru ◽

B. Thirumala Rao ◽

U. Kartheek Chandra Patnaik

Keyword(s):

Information Networks ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

Meta Path

Download Full-text

Introduction to the special issue on multilayer networks

Network Science ◽

10.1017/nws.2017.15 ◽

2017 ◽

Vol 5 (2) ◽

pp. 141-143 ◽

Cited By ~ 3

Author(s):

MATTEO MAGNANI ◽

STANLEY WASSERMAN

Keyword(s):

Social Networks ◽

Social Systems ◽

Information Networks ◽

Special Issue ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

Wide Range ◽

Different Types ◽

Interconnected Networks ◽

Multi Level

During the last century, networks of several types have been used to model a wide range of physical, biological and social systems. For example, Moreno (1934) studied social networks with multiple types of ties, later called multiplex networks (Verbrugge, 1979; Minor, 1983; Lazega & Pattison, 1999) as well as networks with multiple types of actors. Networks with multiple types of actors and relational ties have often been used together: relevant examples are the extensions of two-mode networks studied by Wasserman & Iacobucci (1991), multi-level networks (Lazega & Snijders, 2016), and heterogeneous information networks (Sun et al., 2012). More recently, researchers in physics and computer science have developed models for different types of interconnected networks known as networks of networks (Buldyrev et al., 2010; D'Agostino & Scala, 2014), multilayer social networks (Magnani & Rossi, 2011), and interconnected networks (Dickison et al., 2012).

Download Full-text

Clustering via Meta-path Embedding for Heterogeneous Information Networks

2020 IEEE International Conference on Knowledge Graph (ICKG) ◽

10.1109/icbk50248.2020.00036 ◽

2020 ◽

Author(s):

Yongjun Zhang ◽

Xiaoping Yang ◽

Liang Wang

Keyword(s):

Information Networks ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

Meta Path

Download Full-text

Meta-Path Based Inductive Classification in Heterogeneous Information Networks

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5623 ◽

2016 ◽

Vol 13 (10) ◽

pp. 6747-6753

Author(s):

Pingjian Ding ◽

Xiangtao Chen ◽

Zipin Guan

Keyword(s):

Heterogeneous Networks ◽

State Of The Art ◽

Test Sample ◽

Information Networks ◽

Classification Problems ◽

Heterogeneous Information ◽

Heterogeneous Information Networks ◽

Meta Path ◽

Transductive Inference ◽

The Given

The goal of inductive classification approaches is to infer the correct mapping from test set to labels, while the goal of transductive inference is to predict the correct labels for the given unlabeled data. Hence, the increased unlabeled samples can’t be classified by transductive classification. In this paper, we focus on studying the inductive classification problems in heterogeneous networks, which involve multiple types of objects interconnected by multiple types of links. Moreover, the objects and the links are gradually increasing over time. To accommodate characteristics of heterogeneous networks, a meta-path-based heterogeneous inductive classification (Hic) was proposed. First, the different sub-networks were constructed according to the selected meta-path. Second, the characteristic paths of each sub-network were extracted via the specified minimum support, and were assigned appropriate weights. Then, Hic model based on characteristic path was built. Finally, the Hic scores of each classification label for each test sample was calculated via links between test samples and sub-networks. Experiments on the DBLP showed that the proposed method significantly improves the accuracy and stability over the existing state-of-the-art methods for classification in dynamic heterogeneous network.

Download Full-text