Local clustering via approximate heat kernel PageRank with subgraph sampling

AbstractGraph clustering, a fundamental technique in network science for understanding structures in complex systems, presents inherent problems. Though studied extensively in the literature, graph clustering in large systems remains particularly challenging because massive graphs incur a prohibitively large computational load. The heat kernel PageRank provides a quantitative ranking of nodes, and a local cluster can be efficiently found by performing a sweep over the heat kernel PageRank vector. But computing an exact heat kernel PageRank vector may be expensive, and approximate algorithms are often used instead. Most approximate algorithms compute the heat kernel PageRank vector on the whole graph, and thus are dependent on global structures. In this paper, we present an algorithm for approximating the heat kernel PageRank on a local subgraph. Moreover, we show that the number of computations required by the proposed algorithm is sublinear in terms of the expected size of the local cluster of interest, and that it provides a good approximation of the heat kernel PageRank, with approximation errors bounded by a probabilistic guarantee. Numerical experiments verify that the local clustering algorithm using our approximate heat kernel PageRank achieves state-of-the-art performance.

Download Full-text

A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning

SIAM Journal on Computing ◽

10.1137/080744888 ◽

2013 ◽

Vol 42 (1) ◽

pp. 1-26 ◽

Cited By ~ 82

Author(s):

Daniel A. Spielman ◽

Shang-Hua Teng

Keyword(s):

Graph Partitioning ◽

Clustering Algorithm ◽

Linear Time ◽

Massive Graphs ◽

Time Graph ◽

Local Clustering

Download Full-text

Computing Heat Kernel Pagerank and a Local Clustering Algorithm

Lecture Notes in Computer Science - Combinatorial Algorithms ◽

10.1007/978-3-319-19315-1_10 ◽

2015 ◽

pp. 110-121 ◽

Cited By ~ 3

Author(s):

Fan Chung ◽

Olivia Simpson

Keyword(s):

Heat Kernel ◽

Clustering Algorithm ◽

Local Clustering

Download Full-text

Computing heat kernel pagerank and a local clustering algorithm

European Journal of Combinatorics ◽

10.1016/j.ejc.2017.07.013 ◽

2018 ◽

Vol 68 ◽

pp. 96-119 ◽

Cited By ~ 4

Author(s):

Fan Chung ◽

Olivia Simpson

Keyword(s):

Heat Kernel ◽

Clustering Algorithm ◽

Local Clustering

Download Full-text

Analyzing Data Changes using Mean Shift Clustering

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416500166 ◽

2016 ◽

Vol 30 (07) ◽

pp. 1650016 ◽

Cited By ~ 3

Author(s):

Nir Sharet ◽

Ilan Shimshoni

Keyword(s):

Clustering Algorithm ◽

State Of The Art ◽

Mean Shift ◽

Stereo Image ◽

Learning Method ◽

Training Set ◽

Local Cluster ◽

Mean Shift Clustering ◽

Cluster Distribution ◽

The Mean

A nonparametric unsupervised method for analyzing changes in complex datasets is proposed. It is based on the mean shift clustering algorithm. Mean shift is used to cluster the old and new datasets and compare the results in a nonparametric manner. Each point from the new dataset naturally belongs to a cluster of points from its dataset. The method is also able to find to which cluster the point belongs in the old dataset and use this information to report qualitative differences between that dataset and the new one. Changes in local cluster distribution are also reported. The report can then be used to try to understand the underlying reasons which caused the changes in the distributions. On the basis of this method, a transductive transfer learning method for automatically labeling data from the new dataset is also proposed. This labeled data is used, in addition to the old training set, to train a classifier better suited to the new dataset. The algorithm has been implemented and tested on simulated and real (a stereo image pair) datasets. Its performance was also compared with several state-of-the-art methods.

Download Full-text

Semantic frame induction through the detection of communities of verbs and their arguments

Applied Network Science ◽

10.1007/s41109-020-00312-z ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Eugénio Ribeiro ◽

Andreia Sofia Teixeira ◽

Ricardo Ribeiro ◽

David Martins de Matos

Keyword(s):

Community Detection ◽

Clustering Algorithm ◽

State Of The Art ◽

Graph Clustering ◽

Semantic Role ◽

Detection Problem ◽

Current State ◽

Textual Data ◽

Semantic Frames ◽

Do So

Abstract Resources such as FrameNet, which provide sets of semantic frame definitions and annotated textual data that maps into the evoked frames, are important for several NLP tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame and verb arguments that play the same semantic role. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances or arguments as nodes connected by edges if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of SemEval 2019, we outperformed all of the previous approaches to the task, achieving the current state-of-the-art performance.

Download Full-text

Segmentation of SAR Image using Fuzzy C-Means and Filters

Science & Technology Journal ◽

10.22232/stj.2020.08.01.11 ◽

2020 ◽

Vol 8 (1) ◽

pp. 84-90

Author(s):

R. Lalchhanhima ◽

◽

Debdatta Kandar ◽

R. Chawngsangpuii ◽

Vanlalmuansangi Khenglawt ◽

...

Keyword(s):

Clustering Algorithm ◽

State Of The Art ◽

Speckle Noise ◽

Synthetic Aperture Radar Image ◽

Synthetic Aperture ◽

Sar Image ◽

Spatial Filters ◽

Fuzzy C Means ◽

Automatic Clustering ◽

Intensity Information

Fuzzy C-Means is an unsupervised clustering algorithm for the automatic clustering of data. Synthetic Aperture Radar Image Segmentation has been a challenging task because of the presence of speckle noise. Therefore the segmentation process can not directly rely on the intensity information alone but must consider several derived features in order to get satisfactory segmentation results. In this paper, it is attempted to use the fuzzy nature of classification for the purpose of unsupervised region segmentation in which FCM is employed. Different features are obtained by filtering of the image by using different spatial filters and are selected for segmentation criteria. The segmentation performance is determined by the accuracy compared with a different state of the art techniques proposed recently.

Download Full-text

Malaria parasite detection in thick blood smear microscopic images using modified YOLOV3 and YOLOV4 models

BMC Bioinformatics ◽

10.1186/s12859-021-04036-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Fetulhak Abdurahman ◽

Kinde Anlay Fante ◽

Mohammed Aliy

Keyword(s):

Object Detection ◽

Malaria Parasite ◽

Blood Smear ◽

Clustering Algorithm ◽

State Of The Art ◽

Detection Accuracy ◽

Small Object ◽

Thick Blood Smear ◽

Malaria Parasites ◽

Microscopic Images

Abstract Background Manual microscopic examination of Leishman/Giemsa stained thin and thick blood smear is still the “gold standard” for malaria diagnosis. One of the drawbacks of this method is that its accuracy, consistency, and diagnosis speed depend on microscopists’ diagnostic and technical skills. It is difficult to get highly skilled microscopists in remote areas of developing countries. To alleviate this problem, in this paper, we propose to investigate state-of-the-art one-stage and two-stage object detection algorithms for automated malaria parasite screening from microscopic image of thick blood slides. Results YOLOV3 and YOLOV4 models, which are state-of-the-art object detectors in accuracy and speed, are not optimized for detecting small objects such as malaria parasites in microscopic images. We modify these models by increasing feature scale and adding more detection layers to enhance their capability of detecting small objects without notably decreasing detection speed. We propose one modified YOLOV4 model, called YOLOV4-MOD and two modified models of YOLOV3, which are called YOLOV3-MOD1 and YOLOV3-MOD2. Besides, new anchor box sizes are generated using K-means clustering algorithm to exploit the potential of these models in small object detection. The performance of the modified YOLOV3 and YOLOV4 models were evaluated on a publicly available malaria dataset. These models have achieved state-of-the-art accuracy by exceeding performance of their original versions, Faster R-CNN, and SSD in terms of mean average precision (mAP), recall, precision, F1 score, and average IOU. YOLOV4-MOD has achieved the best detection accuracy among all the other models with a mAP of 96.32%. YOLOV3-MOD2 and YOLOV3-MOD1 have achieved mAP of 96.14% and 95.46%, respectively. Conclusions The experimental results of this study demonstrate that performance of modified YOLOV3 and YOLOV4 models are highly promising for detecting malaria parasites from images captured by a smartphone camera over the microscope eyepiece. The proposed system is suitable for deployment in low-resource setting areas.

Download Full-text

WLeidenRDF: RDF Data Query Method based on Semantic-Enhanced Graph-Clustering Algorithm

2020 International Symposium on Theoretical Aspects of Software Engineering (TASE) ◽

10.1109/tase49443.2020.00014 ◽

2020 ◽

Author(s):

Liu Yang ◽

Zhou Chen ◽

Yiqing Feng ◽

Zhifang Liao ◽

Zhigang Hu ◽

...

Keyword(s):

Clustering Algorithm ◽

Graph Clustering ◽

Data Query ◽

Rdf Data

Download Full-text

Efficient Vector Partitioning Algorithms for Graph Clustering

journal of Data Intelligence ◽

10.26421/jdi1.2-1 ◽

2020 ◽

Vol 1 (2) ◽

pp. 101-123

Author(s):

Hiroaki Shiokawa ◽

Yasunori Futamura

Keyword(s):

Social Networks ◽

Large Scale ◽

Clustering Algorithm ◽

Ground Truth ◽

Graph Clustering ◽

Mining Communities ◽

Fine Grained ◽

Efficient Vector ◽

Public Datasets ◽

Many Core

This paper addressed the problem of finding clusters included in graph-structured data such as Web graphs, social networks, and others. Graph clustering is one of the fundamental techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, the modularity-based graph clustering algorithm is successfully used in many applications. However, it is difficult for the modularity-based methods to find fine-grained clusters hidden in large-scale graphs; the methods fail to reproduce the ground truth. In this paper, we present a novel modularity-based algorithm, \textit{CAV}, that shows better clustering results than the traditional algorithm. The proposed algorithm employs a cohesiveness-aware vector partitioning into the graph spectral analysis to improve the clustering accuracy. Additionally, this paper also presents a novel efficient algorithm \textit{P-CAV} for further improving the clustering speed of CAV; P-CAV is an extension of CAV that utilizes the thread-based parallelization on a many-core CPU. Our extensive experiments on synthetic and public datasets demonstrate the performance superiority of our approaches over the state-of-the-art approaches.

Download Full-text

A Local Graph Clustering Algorithm for Discovering Subgoals in Reinforcement Learning

Communication and Networking - Communications in Computer and Information Science ◽

10.1007/978-3-642-17604-3_5 ◽

2010 ◽

pp. 41-50 ◽

Cited By ~ 1

Author(s):

Negin Entezari ◽

Mohammad Ebrahim Shiri ◽

Parham Moradi

Keyword(s):

Reinforcement Learning ◽

Clustering Algorithm ◽

Graph Clustering ◽

Local Graph

Download Full-text