scholarly journals On a two-truths phenomenon in spectral graph clustering

2019 ◽  
Vol 116 (13) ◽  
pp. 5995-6000 ◽  
Author(s):  
Carey E. Priebe ◽  
Youngser Park ◽  
Joshua T. Vogelstein ◽  
John M. Conroy ◽  
Vince Lyzinski ◽  
...  

Clustering is concerned with coherently grouping observations without any explicit concept of true groupings. Spectral graph clustering—clustering the vertices of a graph based on their spectral embedding—is commonly approached viaK-means (or, more generally, Gaussian mixture model) clustering composed with either Laplacian spectral embedding (LSE) or adjacency spectral embedding (ASE). Recent theoretical results provide deeper understanding of the problem and solutions and lead us to a “two-truths” LSE vs. ASE spectral graph clustering phenomenon convincingly illustrated here via a diffusion MRI connectome dataset: The different embedding methods yield different clustering results, with LSE capturing left hemisphere/right hemisphere affinity structure and ASE capturing gray matter/white matter core–periphery structure.

2020 ◽  
Vol 34 (04) ◽  
pp. 4215-4222
Author(s):  
Binyuan Hui ◽  
Pengfei Zhu ◽  
Qinghua Hu

Graph convolutional networks (GCN) have achieved promising performance in attributed graph clustering and semi-supervised node classification because it is capable of modeling complex graphical structure, and jointly learning both features and relations of nodes. Inspired by the success of unsupervised learning in the training of deep models, we wonder whether graph-based unsupervised learning can collaboratively boost the performance of semi-supervised learning. In this paper, we propose a multi-task graph learning model, called collaborative graph convolutional networks (CGCN). CGCN is composed of an attributed graph clustering network and a semi-supervised node classification network. As Gaussian mixture models can effectively discover the inherent complex data distributions, a new end to end attributed graph clustering network is designed by combining variational graph auto-encoder with Gaussian mixture models (GMM-VGAE) rather than the classic k-means. If the pseudo-label of an unlabeled sample assigned by GMM-VGAE is consistent with the prediction of the semi-supervised GCN, it is selected to further boost the performance of semi-supervised learning with the help of the pseudo-labels. Extensive experiments on benchmark graph datasets validate the superiority of our proposed GMM-VGAE compared with the state-of-the-art attributed graph clustering networks. The performance of node classification is greatly improved by our proposed CGCN, which verifies graph-based unsupervised learning can be well exploited to enhance the performance of semi-supervised learning.


2013 ◽  
Vol 760-762 ◽  
pp. 1556-1561
Author(s):  
Ting Wei Du ◽  
Bo Liu

Indoor scene understanding based on the depth image data is a cutting-edge issue in the field of three-dimensional computer vision. Taking the layout characteristics of the indoor scenes and more plane features in these scenes into account, this paper presents a depth image segmentation method based on Gauss Mixture Model clustering. First, transform the Kinect depth image data into point cloud which is in the form of discrete three-dimensional point data, and denoise and down-sample the point cloud data; second, calculate the point normal of all points in the entire point cloud, then cluster the entire normal using Gaussian Mixture Model, and finally implement the entire point clouds segmentation by RANSAC algorithm. Experimental results show that the divided regions have obvious boundaries and segmentation quality is above normal, and lay a good foundation for object recognition.


2019 ◽  
Vol 488 (3) ◽  
pp. 3810-3817 ◽  
Author(s):  
Jade Powell ◽  
Simon Stevenson ◽  
Ilya Mandel ◽  
Peter Tiňo

ABSTRACT The mass and spin distributions of compact binary gravitational-wave sources are currently uncertain due to complicated astrophysics involved in their formation. Multiple sub-populations of compact binaries representing different evolutionary scenarios may be present amongst sources detected by Advanced LIGO and Advanced Virgo. In addition to hierarchical modelling, unmodelled methods can aid in determining the number of sub-populations and their properties. In this paper, we apply Gaussian mixture model clustering to 1000 simulated gravitational-wave compact binary sources from a mixture of five sub-populations. Using both mass and spin as input parameters, we determine how many binary detections are needed to accurately determine the number of sub-populations and their mass and spin distributions. In the most difficult case that we consider, where two sub-populations have identical mass distributions but differ in their spin, which is poorly constrained by gravitational-wave detections, we find that ∼400 detections are needed before we can identify the correct number of sub-populations.


2021 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Daniel Dole-Muinos ◽  
Ayomide Ajayi ◽  
Mattia Prosperi ◽  
...  

Optical mapping is a method for creating high resolution restriction maps of an entire genome. Optical mapping has been largely automated, and first produces single molecule restriction maps, called Rmaps, which are assembled to generate genome wide optical maps. Since the location and orientation of each Rmap is unknown, the first problem in the analysis of this data is finding related Rmaps, i.e., pairs of Rmaps that share the same orientation and have significant overlap in their genomic location. Although heuristics for identifying related Rmaps exist, they all require quantization of the data which leads to a loss in the precision. In this paper, we propose a Gaussian mixture modelling clustering based method, which we refer to as OMclust, that finds overlapping Rmaps without quantization. Using both simulated and real datasets, we show that OMclust substantially improves the precision (from 48.3% to 73.3%) over the state-of-the art methods while also reducing CPU time and memory consumption. Further, we integrated OMclust into the error correction methods (Elmeri and cOMet) to demonstrate the increase in the performance of these methods. When OMclust was combined with cOMet to error correct Rmap data generated from human DNA, it was able to error correct close to 3x more Rmaps, and reduced the CPU time by more than 35x. Our software is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/OMclust


2019 ◽  
Vol 178 ◽  
pp. 84-97 ◽  
Author(s):  
Wenzhen Jia ◽  
Yanyan Tan ◽  
Li Liu ◽  
Jing Li ◽  
Huaxiang Zhang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document