scholarly journals Spectral Clustering in Heterogeneous Information Networks

Author(s):  
Xiang Li ◽  
Ben Kao ◽  
Zhaochun Ren ◽  
Dawei Yin

A heterogeneous information network (HIN) is one whose objects are of different types and links between objects could model different object relations. We study how spectral clustering can be effectively applied to HINs. In particular, we focus on how meta-path relations are used to construct an effective similarity matrix based on which spectral clustering is done. We formulate the similarity matrix construction as an optimization problem and propose the SClump algorithm for solving the problem. We conduct extensive experiments comparing SClump with other state-of-the-art clustering algorithms on HINs. Our results show that SClump outperforms the competitors over a range of datasets w.r.t. different clustering quality measures.

Author(s):  
Juanjuan Luo ◽  
Huadong Ma ◽  
Dongqing Zhou

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.


2013 ◽  
Vol 765-767 ◽  
pp. 580-584
Author(s):  
Yu Yang ◽  
Cheng Gui Zhao

Spectral clustering algorithms inevitable exist computational time and memory use problems for large-scale spectral clustering, owing to compute-intensive and data-intensive. We analyse the time complexity of constructing similarity matrix, doing eigendecomposition and performing k-means and exploiting SPMD parallel structure supported by MATLAB Parallel Computing Toolbox (PCT) to decrease eigendecomposition computational time. We propose using MATLAB Distributed Computing Server to parallel construct similarity matrix, whilst using t-nearest neighbors approach to reduce memory use. Ultimately, we present clustering time, clustering quality and clustering accuracy in the experiments.


2017 ◽  
Vol 5 (2) ◽  
pp. 141-143 ◽  
Author(s):  
MATTEO MAGNANI ◽  
STANLEY WASSERMAN

During the last century, networks of several types have been used to model a wide range of physical, biological and social systems. For example, Moreno (1934) studied social networks with multiple types of ties, later called multiplex networks (Verbrugge, 1979; Minor, 1983; Lazega & Pattison, 1999) as well as networks with multiple types of actors. Networks with multiple types of actors and relational ties have often been used together: relevant examples are the extensions of two-mode networks studied by Wasserman & Iacobucci (1991), multi-level networks (Lazega & Snijders, 2016), and heterogeneous information networks (Sun et al., 2012). More recently, researchers in physics and computer science have developed models for different types of interconnected networks known as networks of networks (Buldyrev et al., 2010; D'Agostino & Scala, 2014), multilayer social networks (Magnani & Rossi, 2011), and interconnected networks (Dickison et al., 2012).


2016 ◽  
Vol 13 (10) ◽  
pp. 6747-6753
Author(s):  
Pingjian Ding ◽  
Xiangtao Chen ◽  
Zipin Guan

The goal of inductive classification approaches is to infer the correct mapping from test set to labels, while the goal of transductive inference is to predict the correct labels for the given unlabeled data. Hence, the increased unlabeled samples can’t be classified by transductive classification. In this paper, we focus on studying the inductive classification problems in heterogeneous networks, which involve multiple types of objects interconnected by multiple types of links. Moreover, the objects and the links are gradually increasing over time. To accommodate characteristics of heterogeneous networks, a meta-path-based heterogeneous inductive classification (Hic) was proposed. First, the different sub-networks were constructed according to the selected meta-path. Second, the characteristic paths of each sub-network were extracted via the specified minimum support, and were assigned appropriate weights. Then, Hic model based on characteristic path was built. Finally, the Hic scores of each classification label for each test sample was calculated via links between test samples and sub-networks. Experiments on the DBLP showed that the proposed method significantly improves the accuracy and stability over the existing state-of-the-art methods for classification in dynamic heterogeneous network.


Sign in / Sign up

Export Citation Format

Share Document