Weight-Discounted Symmetrization in Clustering Directed Graphs

An increasing attention has been recently devoted to uncovering community structure in directed graphs which widely exist in real-world complex networks such as social networks, citation networks, World Wide Web, email networks, etc. A two-stage framework for detecting clusters is an effective way for clustering directed graphs while the first stage is to symmetrize the directed graph using some similarity measures. Any state-of-the-art clustering algorithms for undirected graphs can be leveraged in the second stage. Hence, both stages are important to the effectiveness of the clustering result. However, existing symmetrization methods only consider about the direction of edges but ignore the weights of nodes. In this paper, we first attempt to connect link analysis in directed graph clustering. This connection not only takes into consideration the directionality of edges but also uses node ranking scores such as authority and hub score to explicitly capture in-link and out-link similarity. We also demonstrate the generality of our proposed method by showing that existing state-of-the-art symmetrization methods can be derived from our method. Empirical validation shows that our method can find communities effectively in real world networks.

Download Full-text

Symmetrization for Embedding Directed Graphs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110043 ◽

2019 ◽

Vol 33 ◽

pp. 10043-10044

Author(s):

Jiankai Sun ◽

Srinivasan Parthasarathy

Keyword(s):

Directed Graph ◽

Undirected Graph ◽

State Of The Art ◽

Directed Graphs ◽

Graph Embedding ◽

Embedding Problem ◽

Two Stage ◽

Second Stage

In this paper, we propose to solve the directed graph embedding problem via a two stage approach: in the first stage, the graph is symmetrized in one of several possible ways, and in the second stage, the so-obtained symmetrized graph is embeded using any state-of-the-art (undirected) graph embedding algorithm. Note that it is not the objective of this paper to propose a new (undirected) graph embedding algorithm or discuss the strengths and weaknesses of existing ones; all we are saying is that whichever be the suitable graph embedding algorithm, it will fit in the above proposed symmetrization framework.

Download Full-text

Lifelong Spectral Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6045 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5867-5874

Author(s):

Gan Sun ◽

Yang Cong ◽

Qianqian Wang ◽

Jun Li ◽

Yun Fu

Keyword(s):

Machine Learning ◽

Real World ◽

Spectral Clustering ◽

State Of The Art ◽

Clustering Algorithms ◽

Orthogonal Basis ◽

Learning Framework ◽

The Past ◽

Benchmark Datasets ◽

Over Time

In the past decades, spectral clustering (SC) has become one of the most effective clustering algorithms. However, most previous studies focus on spectral clustering tasks with a fixed task set, which cannot incorporate with a new spectral clustering task without accessing to previously learned tasks. In this paper, we aim to explore the problem of spectral clustering in a lifelong machine learning framework, i.e., Lifelong Spectral Clustering (L2SC). Its goal is to efficiently learn a model for a new spectral clustering task by selectively transferring previously accumulated experience from knowledge library. Specifically, the knowledge library of L2SC contains two components: 1) orthogonal basis library: capturing latent cluster centers among the clusters in each pair of tasks; 2) feature embedding library: embedding the feature manifold information shared among multiple related tasks. As a new spectral clustering task arrives, L2SC firstly transfers knowledge from both basis library and feature library to obtain encoding matrix, and further redefines the library base over time to maximize performance across all the clustering tasks. Meanwhile, a general online update formulation is derived to alternatively update the basis library and feature library. Finally, the empirical experiments on several real-world benchmark datasets demonstrate that our L2SC model can effectively improve the clustering performance when comparing with other state-of-the-art spectral clustering algorithms.

Download Full-text

Conexões entre grafos e matrizes na modelagem de problemas matemáticos

Ciência e Natura ◽

10.5902/2179460x35519 ◽

2019 ◽

Vol 40 ◽

pp. 183

Author(s):

Larissa Melchiors Furlan ◽

Mylena Roehrs ◽

Glauber Rodrigues de Quadros

Keyword(s):

Directed Graph ◽

Real World ◽

Adjacency Matrix ◽

Directed Graphs ◽

The Real ◽

Mathematical World ◽

The Everyday ◽

Mathematical Problems ◽

Everyday Problems

Graphs theory is very important in the mathematical world as an excellent way of connecting with the real world. By using the theory of directed graphs it is possible to transform many of the everyday problems into mathematical problems, so as to make an exact study in each case. In this work we explore the matrices related to the various types of graphs, such as the vertex matrix, which is associated with a directed graph, and the adjacency matrix. Moreover, matrices of multi-step connections are constructed so as to separate the various blades between the vertices of a directed graph. Then, we will construct some applications of those results in the form of examples.

Download Full-text

Consensus Kernel K-Means Clustering for Incomplete Multiview Data

Computational Intelligence and Neuroscience ◽

10.1155/2017/3961718 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Yongkai Ye ◽

Xinwang Liu ◽

Qiang Liu ◽

Jianping Yin

Keyword(s):

Real World ◽

State Of The Art ◽

Clustering Algorithms ◽

Multiple Views ◽

Learning Method ◽

Learning Framework ◽

Optimal Integration ◽

Art Methods ◽

Multiview Clustering

Multiview clustering aims to improve clustering performance through optimal integration of information from multiple views. Though demonstrating promising performance in various applications, existing multiview clustering algorithms cannot effectively handle the view’s incompleteness. Recently, one pioneering work was proposed that handled this issue by integrating multiview clustering and imputation into a unified learning framework. While its framework is elegant, we observe that it overlooks the consistency between views, which leads to a reduction in the clustering performance. In order to address this issue, we propose a new unified learning method for incomplete multiview clustering, which simultaneously imputes the incomplete views and learns a consistent clustering result with explicit modeling of between-view consistency. More specifically, the similarity between each view’s clustering result and the consistent clustering result is measured. The consistency between views is then modeled using the sum of these similarities. Incomplete views are imputed to achieve an optimal clustering result in each view, while maintaining between-view consistency. Extensive comparisons with state-of-the-art methods on both synthetic and real-world incomplete multiview datasets validate the superiority of the proposed method.

Download Full-text

XML clustering: a review of structural approaches

The Knowledge Engineering Review ◽

10.1017/s0269888914000216 ◽

2014 ◽

Vol 30 (3) ◽

pp. 297-323 ◽

Cited By ~ 7

Author(s):

Maciej Piernik ◽

Dariusz Brzezinski ◽

Tadeusz Morzy ◽

Anna Lesniewska

Keyword(s):

State Of The Art ◽

Clustering Algorithms ◽

Similarity Measures ◽

Structural Similarity ◽

Research Area ◽

Future Research ◽

Markup Language ◽

Evaluation Measures ◽

Clustering Quality ◽

Extensible Markup

AbstractWith its presence in data integration, chemistry, biological, and geographic systems, eXtensible Markup Language (XML) has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documents—an issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highlight the main challenges in this research area. Subsequently, we divide the problem into three subtasks and discuss the most common document representations, structural similarity measures, and clustering algorithms. In addition, we present the most popular evaluation measures, which can be used to estimate clustering quality. Finally, we analyze and compare 23 state-of-the-art approaches and arrange them in an original taxonomy. By providing an up-to-date analysis of existing structural XML clustering algorithms, we hope to showcase methods suitable for current applications and draw lines of future research.

Download Full-text

A Scalable Unsupervised Classification Method Using Rough Set for Remote Sensing Imagery

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2021040104 ◽

2021 ◽

Vol 13 (2) ◽

pp. 65-88

Author(s):

Aditya Raj ◽

Sonajharia Minz

Keyword(s):

State Of The Art ◽

Clustering Algorithms ◽

Satellite Image ◽

Similarity Measures ◽

Image Data ◽

Unsupervised Classification ◽

Space Representation ◽

Geographic Scale ◽

Geographic Space ◽

Mixed Pixels

Reference to geographic scale and geographic space representation are characteristics of geospatial data. This work has discussed two issues related to satellite image data, namely huge size and mixed pixels. In clustering, an unsupervised classification and a set of similar objects are grouped together based on the similarity measures. The similarity between intracluster objects is high, whereas the similarity between intercluster objects is low. This paper proposes a clustering technique called spatial rough k-means that classifies the mixed pixels based on their spatial neighbourhood relationship. The authors compared the performance of different state-of-the-art clustering algorithms with that of proposed algorithms for image partitioning and map-reduce methods. The results show that the proposed algorithm has produced clusters of better quality than state-of-the-art algorithms in both the approaches used for handling the vast input data size. Experiments conducted on Landsat-TM 5 data of Delhi region demonstrate the effectiveness of the proposed work.

Download Full-text

Discriminative and Correlative Partial Multi-Label Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/512 ◽

2019 ◽

Cited By ~ 4

Author(s):

Haobo Wang ◽

Weiwei Liu ◽

Yang Zhao ◽

Chen Zhang ◽

Tianlei Hu ◽

...

Keyword(s):

Real World ◽

State Of The Art ◽

Feature Space ◽

Gradient Boosting ◽

Training Procedure ◽

Second Stage ◽

Label Correlations ◽

Real World Datasets ◽

Partial Label Learning ◽

Confidence Value

In partial label learning (PML), each instance is associated with a candidate label set that contains multiple relevant labels and other false positive labels. The most challenging issue for the PML is that the training procedure is prone to be affected by the labeling noise. We observe that state-of-the-art PML methods are either powerless to disambiguate the correct labels from the candidate labels or incapable of extracting the label correlations sufficiently. To fill this gap, a two-stage DiscRiminative and correlAtive partial Multi-label leArning (DRAMA) algorithm is presented in this work. In the first stage, a confidence value is learned for each label by utilizing the feature manifold, which indicates how likely a label is correct. In the second stage, a gradient boosting model is induced to fit the label confidences. Specifically, to explore the label correlations, we augment the feature space by the previously elicited labels on each boosting round. Extensive experiments on various real-world datasets clearly validate the superiority of our proposed method.

Download Full-text

Personalised attraction recommendation for enhancing topic diversity and accuracy

Journal of Information Science ◽

10.1177/0165551521999801 ◽

2021 ◽

pp. 016555152199980

Author(s):

Yuanyuan Lin ◽

Chao Huang ◽

Wei Yao ◽

Yifei Shao

Keyword(s):

Real World ◽

Information Overload ◽

Two Stage ◽

Misclassification Cost ◽

Second Stage ◽

Low Visibility ◽

Recommendation Diversity ◽

Optimisation Model ◽

Definition Of ◽

Improved Methods

Attraction recommendation plays an important role in tourism, such as solving information overload problems and recommending proper attractions to users. Currently, most recommendation methods are dedicated to improving the accuracy of recommendations. However, recommendation methods only focusing on accuracy tend to recommend popular items that are often purchased by users, which results in a lack of diversity and low visibility of non-popular items. Hence, many studies have suggested the importance of recommendation diversity and proposed improved methods, but there is room for improvement. First, the definition of diversity for different items requires consideration for domain characteristics. Second, the existing algorithms for improving diversity sacrifice the accuracy of recommendations. Therefore, the article utilises the topic ‘features of attractions’ to define the calculation method of recommendation diversity. We developed a two-stage optimisation model to enhance recommendation diversity while maintaining the accuracy of recommendations. In the first stage, an optimisation model considering topic diversity is proposed to increase recommendation diversity and generate candidate attractions. In the second stage, we propose a minimisation misclassification cost optimisation model to balance recommendation diversity and accuracy. To assess the performance of the proposed method, experiments are conducted with real-world travel data. The results indicate that the proposed two-stage optimisation model can significantly improve the diversity and accuracy of recommendations.

Download Full-text

An information theoretic approach to link prediction in multiplex networks

Scientific Reports ◽

10.1038/s41598-021-92427-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seyed Hossein Jafari ◽

Amir Mahdi Abdolhosseini-Qomi ◽

Masoud Asadpour ◽

Maseud Rahgozar ◽

Naser Yazdani

Keyword(s):

Real World ◽

Link Prediction ◽

Large Scale ◽

Similarity Measures ◽

Prediction Method ◽

General Purpose ◽

Fast Method ◽

Theoretic Approach ◽

Multiplex Networks ◽

Wide Range

AbstractThe entities of real-world networks are connected via different types of connections (i.e., layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method—SimBins—is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applying SimBins to various datasets from diverse domains, our findings indicate that SimBins outperforms the compared methods (both baseline and state-of-the-art methods) in most instances when predicting links. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.

Download Full-text

Extrinsic Camera Calibration with Line-Laser Projection

Sensors ◽

10.3390/s21041091 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1091

Author(s):

Izaak Van Crombrugge ◽

Rudi Penne ◽

Steve Vanlanduit

Keyword(s):

Camera Calibration ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Bundle Adjustment ◽

Field Of View ◽

Extrinsic Calibration ◽

Practical Procedure ◽

Partial Overlap

Knowledge of precise camera poses is vital for multi-camera setups. Camera intrinsics can be obtained for each camera separately in lab conditions. For fixed multi-camera setups, the extrinsic calibration can only be done in situ. Usually, some markers are used, like checkerboards, requiring some level of overlap between cameras. In this work, we propose a method for cases with little or no overlap. Laser lines are projected on a plane (e.g., floor or wall) using a laser line projector. The pose of the plane and cameras is then optimized using bundle adjustment to match the lines seen by the cameras. To find the extrinsic calibration, only a partial overlap between the laser lines and the field of view of the cameras is needed. Real-world experiments were conducted both with and without overlapping fields of view, resulting in rotation errors below 0.5°. We show that the accuracy is comparable to other state-of-the-art methods while offering a more practical procedure. The method can also be used in large-scale applications and can be fully automated.

Download Full-text