Embedding-based Silhouette community detection

Blaž Škrlj; Jan Kralj; Nada Lavrač

doi:10.1007/s10994-020-05882-8

Embedding-based Silhouette community detection

Machine Learning ◽

10.1007/s10994-020-05882-8 ◽

2020 ◽

Vol 109 (11) ◽

pp. 2161-2193 ◽

Cited By ~ 1

Author(s):

Blaž Škrlj ◽

Jan Kralj ◽

Nada Lavrač

Keyword(s):

Community Detection ◽

Real Life ◽

Interaction Network ◽

Subgroup Discovery ◽

Complex Data ◽

Detection Algorithms ◽

Scientific Disciplines ◽

Network Communities ◽

Art Community ◽

Node Embeddings

AbstractMining complex data in the form of networks is of increasing interest in many scientific disciplines. Network communities correspond to densely connected subnetworks, and often represent key functional parts of real-world systems. This paper proposes the embedding-based Silhouette community detection (SCD), an approach for detecting communities, based on clustering of network node embeddings, i.e. real valued representations of nodes derived from their neighborhoods. We investigate the performance of the proposed SCD approach on 234 synthetic networks, as well as on a real-life social network. Even though SCD is not based on any form of modularity optimization, it performs comparably or better than state-of-the-art community detection algorithms, such as the InfoMap and Louvain. Further, we demonstrate that SCD’s outputs can be used along with domain ontologies in semantic subgroup discovery, yielding human-understandable explanations of communities detected in a real-life protein interaction network. Being embedding-based, SCD is widely applicable and can be tested out-of-the-box as part of many existing network learning and exploration pipelines.

Download Full-text

Recent trends on community detection algorithms: A survey

Modern Physics Letters B ◽

10.1142/s0217984920504084 ◽

2020 ◽

Vol 34 (35) ◽

pp. 2050408

Author(s):

Sumit Gupta ◽

Dhirendra Pratap Singh

Keyword(s):

Community Detection ◽

Real Life ◽

Detection Algorithm ◽

Data Sets ◽

Detection Algorithms ◽

Life Problems ◽

Application Data ◽

Art Community ◽

Community Detection Algorithm ◽

Two Parameters

In today’s world scenario, many of the real-life problems and application data can be represented with the help of the graphs. Nowadays technology grows day by day at a very fast rate; applications generate a vast amount of valuable data, due to which the size of their representation graphs is increased. How to get meaningful information from these data become a hot research topic. Methodical algorithms are required to extract useful information from these raw data. These unstructured graphs are not scattered in nature, but these show some relationships between their basic entities. Identifying communities based on these relationships improves the understanding of the applications represented by graphs. Community detection algorithms are one of the solutions which divide the graph into small size clusters where nodes are densely connected within the cluster and sparsely connected across. During the last decade, there are lots of algorithms proposed which can be categorized into mainly two broad categories; non-overlapping and overlapping community detection algorithm. The goal of this paper is to offer a comparative analysis of the various community detection algorithms. We bring together all the state of art community detection algorithms related to these two classes into a single article with their accessible benchmark data sets. Finally, we represent a comparison of these algorithms concerning two parameters: one is time efficiency, and the other is how accurately the communities are detected.

Download Full-text

Optimizing community detection in social networks using antlion and K-median

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v8i4.1196 ◽

2019 ◽

Vol 8 (4) ◽

Author(s):

Amany A. Naem ◽

Neveen I. Ghali

Keyword(s):

Social Networks ◽

Objective Function ◽

Community Detection ◽

Optimization Problem ◽

Real Life ◽

Optimization Methods ◽

Population Based ◽

Detection Problem ◽

Network Communities ◽

Antlion Optimization

Antlion Optimization (ALO) is one of the latest population based optimization methods that proved its good performance in a variety of applications. The ALO algorithm copies the hunting mechanism of antlions to ants in nature. Community detection in social networks is conclusive to understanding the concepts of the networks. Identifying network communities can be viewed as a problem of clustering a set of nodes into communities. k-median clustering is one of the popular techniques that has been applied in clustering. The problem of clustering network can be formalized as an optimization problem where a qualitatively objective function that captures the intuition of a cluster as a set of nodes with better in ternal connectivity than external connectivity is selected to be optimized. In this paper, a mixture antlion optimization and k-median for solving the community detection problem is proposed and named as K-median Modularity ALO. Experimental results which are applied on real life networks show the ability of the mixture antlion optimization and k-median to detect successfully an optimized community structure based on putting the modularity as an objective function.

Download Full-text

Multiobjective Group Search Optimization Approach for Community Detection in Networks

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2016070103 ◽

2016 ◽

Vol 7 (3) ◽

pp. 50-70 ◽

Cited By ~ 1

Author(s):

Nidhi Arora ◽

Hema Banati

Keyword(s):

Community Detection ◽

Optimization Techniques ◽

Optimization Approach ◽

Multiple Features ◽

Detection Algorithms ◽

Search Optimization ◽

Art Community ◽

Group Search ◽

Single Objective ◽

Connected Communities

Various evolving approaches have been extensively applied to evolve densely connected communities in complex networks. However these techniques have been primarily single objective optimization techniques, which optimize only a specific feature of the network missing on other important features. Multiobjective optimization techniques can overcome this drawback by simultaneously optimizing multiple features of a network. This paper proposes MGSO, a multiobjective variant of Group Search Optimization (GSO) algorithm to globally search and evolve densely connected communities. It uses inherent animal food searching behavior of GSO to simultaneously optimize two negatively correlated objective functions and overcomes the drawbacks of single objective based CD algorithms. The algorithm reduces random initializations which results in fast convergence. It was applied on 6 real world and 33 synthetic network datasets and results were compared with varied state of the art community detection algorithms. The results established show the efficacy of MGSO to find accurate community structures.

Download Full-text

Towards effective discovery of natural communities in complex networks and implications in e-commerce

Electronic Commerce Research ◽

10.1007/s10660-019-09395-y ◽

2020 ◽

Cited By ~ 1

Author(s):

Swarup Chattopadhyay ◽

Tanmay Basu ◽

Asit K. Das ◽

Kuntal Ghosh ◽

Late C. A. Murthy

Keyword(s):

Complex Networks ◽

Community Detection ◽

Similarity Measure ◽

Recommender System ◽

Data Clustering ◽

Ground Truth ◽

Normalized Mutual Information ◽

Detection Algorithms ◽

Node Similarity ◽

Art Community

AbstractAutomated community detection is an important problem in the study of complex networks. The idea of community detection is closely related to the concept of data clustering in pattern recognition. Data clustering refers to the task of grouping similar objects and segregating dissimilar objects. The community detection problem can be thought of as finding groups of densely interconnected nodes with few connections to nodes outside the group. A node similarity measure is proposed here that finds the similarity between two nodes by considering both neighbors and non-neighbors of these two nodes. Subsequently, a method is introduced for identifying communities in complex networks using this node similarity measure and the notion of data clustering. The significant characteristic of the proposed method is that it does not need any prior knowledge about the actual communities of a network. Extensive experiments on several real world and artificial networks with known ground-truth communities are reported. The proposed method is compared with various state of the art community detection algorithms by using several criteria, viz. normalized mutual information, f-measure etc. Moreover, it has been successfully applied in improving the effectiveness of a recommender system which is rapidly becoming a crucial tool in e-commerce applications. The empirical results suggest that the proposed technique has the potential to improve the performance of a recommender system and hence it may be useful for other e-commerce applications.

Download Full-text

The Eminence of Co-Expressed Ties in Schizophrenia Network Communities

Data ◽

10.3390/data4040149 ◽

2019 ◽

Vol 4 (4) ◽

pp. 149

Author(s):

Amulyashree Sridhar ◽

Sharvani GS ◽

AH Manjunatha Reddy ◽

Biplab Bhattacharjee ◽

Kalyan Nagaraj

Keyword(s):

Community Detection ◽

Biological Networks ◽

Gene Networks ◽

Biological Interactions ◽

K Nearest Neighbors ◽

Nonlinear Network ◽

Detection Algorithms ◽

Network Communities ◽

Disease Condition ◽

Modularity Maximization

Exploring gene networks is crucial for identifying significant biological interactions occurring in a disease condition. These interactions can be acknowledged by modeling the tie structure of networks. Such tie orientations are often detected within embedded community structures. However, most of the prevailing community detection modules are intended to capture information from nodes and its attributes, usually ignoring the ties. In this study, a modularity maximization algorithm is proposed based on nonlinear representation of local tangent space alignment (LTSA). Initially, the tangent coordinates are computed locally to identify k-nearest neighbors across the genes. These local neighbors are further optimized by generating a nonlinear network embedding function for detecting gene communities based on eigenvector decomposition. Experimental results suggest that this algorithm detects gene modules with a better modularity index of 0.9256, compared to other traditional community detection algorithms. Furthermore, co-expressed genes across these communities are identified by discovering the characteristic tie structures. These detected ties are known to have substantial biological influence in the progression of schizophrenia, thereby signifying the influence of tie patterns in biological networks. This technique can be extended logically on other diseases networks for detecting substantial gene “hotspots”.

Download Full-text

Evaluating the role of community detection in improving influence maximization heuristics

Social Network Analysis and Mining ◽

10.1007/s13278-021-00804-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

László Hajdu ◽

Miklós Krész ◽

András Bóta

Keyword(s):

Community Detection ◽

Real Life ◽

Detection Algorithm ◽

Influence Maximization ◽

Overlapping Community Detection ◽

Detection Algorithms ◽

Overlapping Community ◽

Community Detection Algorithm ◽

The Cost

AbstractBoth community detection and influence maximization are well-researched fields of network science. Here, we investigate how several popular community detection algorithms can be used as part of a heuristic approach to influence maximization. The heuristic is based on the community value, a node-based metric defined on the outputs of overlapping community detection algorithms. This metric is used to select nodes as high influence candidates for expanding the set of influential nodes. Our aim in this paper is twofold. First, we evaluate the performance of eight frequently used overlapping community detection algorithms on this specific task to show how much improvement can be gained compared to the originally proposed method of Kempe et al. Second, selecting the community detection algorithm(s) with the best performance, we propose a variant of the influence maximization heuristic with significantly reduced runtime, at the cost of slightly reduced quality of the output. We use both artificial benchmarks and real-life networks to evaluate the performance of our approach.

Download Full-text

Determining Network Communities Based on Modular Density Optimization

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666181205153024 ◽

2020 ◽

Vol 13 (2) ◽

pp. 128-136 ◽

Cited By ~ 1

Author(s):

Seema Rani ◽

Monica Mehrotra

Keyword(s):

Community Detection ◽

Bat Algorithm ◽

Limit Problem ◽

Resolution Limit ◽

Community Discovery ◽

Detection Algorithms ◽

Network Communities ◽

Optimization Function ◽

Np Hard Problem ◽

Real World Datasets

Background: In today’s world, complex systems are conceptually observed in the form of network structure. Communities inherently existing in the networks have a recognizable elucidation in understanding the organization of networks. Community discovery in networks has grabbed the attention of researchers from multi-discipline. Community detection problem has been modeled as an optimization problem. In broad-spectrum, existing community detection algorithms have adopted modularity as the optimizing function. However, the modularity is not able to identify communities of smaller size as compared to the size of the network. Methods: This paper addresses the problem of the resolution limit posed by modularity. Modular density measure succeeds in countering the resolution limit problem. Finding network communities with maximum modular density is an NP-hard problem In this work, the discrete bat algorithm with modular density as the optimization function is recommended. Results: Experiments are conducted on three real-world datasets. For determining the consistency, ten independent runs of the proposed algorithm has been carried out. The experimental results show that our proposed algorithm produces high-quality community structure along with small size communities. Conclusion: The results are compared with traditional and evolutionary community detection algorithms. The final outcome shows the superiority of discrete bat algorithm with modular density as the optimization function with respect to number of communities, maximum modularity, and average modularity.

Download Full-text

A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks

Information ◽

10.3390/info11040199 ◽

2020 ◽

Vol 11 (4) ◽

pp. 199 ◽

Cited By ~ 3

Author(s):

Christos Makris ◽

Georgios Pispirigos ◽

Ioannis Orestis Rizos

Keyword(s):

Community Detection ◽

Real Life ◽

Ensemble Methods ◽

Detection Algorithms ◽

Edge Betweenness ◽

Social Graphs ◽

Published Research ◽

Vertex Centrality ◽

Scientific Fields ◽

Bagging Ensemble

Presently, due to the extended availability of gigantic information networks and the beneficial application of graph analysis in various scientific fields, the necessity for efficient and highly scalable community detection algorithms has never been more essential. Despite the significant amount of published research, the existing methods—such as the Girvan–Newman, random-walk edge betweenness, vertex centrality, InfoMap, spectral clustering, etc.—have virtually been proven incapable of handling real-life social graphs due to the intrinsic computational restrictions that lead to mediocre performance and poor scalability. The purpose of this article is to introduce a novel, distributed community detection methodology which in accordance with the community prediction concept, leverages the reduced complexity and the decreased variance of the bagging ensemble methods, to unveil the subjacent community hierarchy. The proposed approach has been thoroughly tested, meticulously compared against different classic community detection algorithms, and practically proven exceptionally scalable, eminently efficient, and promisingly accurate in unfolding the underlying community structure.

Download Full-text

Efficient Estimation of Network Games of Incomplete Information: Application to Large Online Social Networks

Management Science ◽

10.1287/mnsc.2020.3885 ◽

2021 ◽

Author(s):

Xi Chen ◽

Ralf van der Lans ◽

Michael Trusov

Keyword(s):

Social Networks ◽

Social Influence ◽

Incomplete Information ◽

Community Detection ◽

Online Social Networks ◽

Choice Model ◽

Real Life ◽

Efficient Estimation ◽

Data Set ◽

Detection Algorithms

This paper presents a structural discrete choice model with social influence for large-scale social networks. The model is based on an incomplete information game and permits individual-specific parameters of consumers. It is challenging to apply this type of models to real-life scenarios for two reasons: (1) The computation of the Bayesian–Nash equilibrium is highly demanding; and (2) the identification of social influence requires the use of excluded variables that are oftentimes unavailable. To address these challenges, we derive the unique equilibrium conditions of the game, which allow us to employ a stochastic Bayesian estimation procedure that is scalable to large social networks. To facilitate the identification, we utilize community-detection algorithms to divide the network into different groups that, in turn, can be used to construct excluded variables. We validate the proposed structural model with the login decisions of more than 25,000 users of an online social game. Importantly, this data set also contains promotions that were exogenously determined and targeted to only a subgroup of consumers. This information allows us to perform exogeneity tests to validate our identification strategy using community-detection algorithms. Finally, we demonstrate the managerial usefulness of the proposed methodology for improving the strategies of targeting influential consumers in large social networks. This paper was accepted by Matthew Shum, marketing.

Download Full-text

When Less Is More: Systematic Analysis of Cascade-Based Community Detection

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3494563 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-22

Author(s):

Liudmila Prokhorenkova ◽

Alexey Tikhonov ◽

Nelly Litvak

Keyword(s):

Community Detection ◽

Information Diffusion ◽

Real Life ◽

Viral Marketing ◽

Systematic Analysis ◽

Less Is More ◽

Detection Algorithms ◽

Underlying Network ◽

Stable Performance ◽

High Level

Information diffusion, spreading of infectious diseases, and spreading of rumors are fundamental processes occurring in real-life networks. In many practical cases, one can observe when nodes become infected, but the underlying network, over which a contagion or information propagates, is hidden. Inferring properties of the underlying network is important since these properties can be used for constraining infections, forecasting, viral marketing, and so on. Moreover, for many applications, it is sufficient to recover only coarse high-level properties of this network rather than all its edges. This article conducts a systematic and extensive analysis of the following problem: Given only the infection times, find communities of highly interconnected nodes. This task significantly differs from the well-studied community detection problem since we do not observe a graph to be clustered. We carry out a thorough comparison between existing and new approaches on several large datasets and cover methodological challenges specific to this problem. One of the main conclusions is that the most stable performance and the most significant improvement on the current state-of-the-art are achieved by our proposed simple heuristic approaches agnostic to a particular graph structure and epidemic model. We also show that some well-known community detection algorithms can be enhanced by including edge weights based on the cascade data.

Download Full-text