A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks

Christos Makris; Georgios Pispirigos; Ioannis Orestis Rizos

doi:10.3390/info11040199

A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks

Information ◽

10.3390/info11040199 ◽

2020 ◽

Vol 11 (4) ◽

pp. 199 ◽

Cited By ~ 3

Author(s):

Christos Makris ◽

Georgios Pispirigos ◽

Ioannis Orestis Rizos

Keyword(s):

Community Detection ◽

Real Life ◽

Ensemble Methods ◽

Detection Algorithms ◽

Edge Betweenness ◽

Social Graphs ◽

Published Research ◽

Vertex Centrality ◽

Scientific Fields ◽

Bagging Ensemble

Presently, due to the extended availability of gigantic information networks and the beneficial application of graph analysis in various scientific fields, the necessity for efficient and highly scalable community detection algorithms has never been more essential. Despite the significant amount of published research, the existing methods—such as the Girvan–Newman, random-walk edge betweenness, vertex centrality, InfoMap, spectral clustering, etc.—have virtually been proven incapable of handling real-life social graphs due to the intrinsic computational restrictions that lead to mediocre performance and poor scalability. The purpose of this article is to introduce a novel, distributed community detection methodology which in accordance with the community prediction concept, leverages the reduced complexity and the decreased variance of the bagging ensemble methods, to unveil the subjacent community hierarchy. The proposed approach has been thoroughly tested, meticulously compared against different classic community detection algorithms, and practically proven exceptionally scalable, eminently efficient, and promisingly accurate in unfolding the underlying community structure.

Download Full-text

A Distributed Hybrid Community Detection Methodology for Social Networks

Algorithms ◽

10.3390/a12080175 ◽

2019 ◽

Vol 12 (8) ◽

pp. 175

Author(s):

Konstantinos Georgiou ◽

Christos Makris ◽

Georgios Pispirigos

Keyword(s):

Community Detection ◽

Discrete Mathematics ◽

Real World Data ◽

Detection Algorithms ◽

Edge Betweenness ◽

Social Graphs ◽

Local Edge ◽

Available Information ◽

Content Information ◽

Vast Range

Nowadays, the amount of digitally available information has tremendously grown, with real-world data graphs outreaching the millions or even billions of vertices. Hence, community detection, where groups of vertices are formed according to a well-defined similarity measure, has never been more essential affecting a vast range of scientific fields such as bio-informatics, sociology, discrete mathematics, nonlinear dynamics, digital marketing, and computer science. Even if an impressive amount of research has yet been published to tackle this NP-hard class problem, the existing methods and algorithms have virtually been proven inefficient and severely unscalable. In this regard, the purpose of this manuscript is to combine the network topology properties expressed by the loose similarity and the local edge betweenness, which is a currently proposed Girvan–Newman’s edge betweenness measure alternative, along with the intrinsic user content information, in order to introduce a novel and highly distributed hybrid community detection methodology. The proposed approach has been thoroughly tested on various real social graphs, roundly compared to other classic divisive community detection algorithms that serve as baselines and practically proven exceptionally scalable, highly efficient, and adequately accurate in terms of revealing the subjacent network hierarchy.

Download Full-text

Recent trends on community detection algorithms: A survey

Modern Physics Letters B ◽

10.1142/s0217984920504084 ◽

2020 ◽

Vol 34 (35) ◽

pp. 2050408

Author(s):

Sumit Gupta ◽

Dhirendra Pratap Singh

Keyword(s):

Community Detection ◽

Real Life ◽

Detection Algorithm ◽

Data Sets ◽

Detection Algorithms ◽

Life Problems ◽

Application Data ◽

Art Community ◽

Community Detection Algorithm ◽

Two Parameters

In today’s world scenario, many of the real-life problems and application data can be represented with the help of the graphs. Nowadays technology grows day by day at a very fast rate; applications generate a vast amount of valuable data, due to which the size of their representation graphs is increased. How to get meaningful information from these data become a hot research topic. Methodical algorithms are required to extract useful information from these raw data. These unstructured graphs are not scattered in nature, but these show some relationships between their basic entities. Identifying communities based on these relationships improves the understanding of the applications represented by graphs. Community detection algorithms are one of the solutions which divide the graph into small size clusters where nodes are densely connected within the cluster and sparsely connected across. During the last decade, there are lots of algorithms proposed which can be categorized into mainly two broad categories; non-overlapping and overlapping community detection algorithm. The goal of this paper is to offer a comparative analysis of the various community detection algorithms. We bring together all the state of art community detection algorithms related to these two classes into a single article with their accessible benchmark data sets. Finally, we represent a comparison of these algorithms concerning two parameters: one is time efficiency, and the other is how accurately the communities are detected.

Download Full-text

Evaluating the role of community detection in improving influence maximization heuristics

Social Network Analysis and Mining ◽

10.1007/s13278-021-00804-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

László Hajdu ◽

Miklós Krész ◽

András Bóta

Keyword(s):

Community Detection ◽

Real Life ◽

Detection Algorithm ◽

Influence Maximization ◽

Overlapping Community Detection ◽

Detection Algorithms ◽

Overlapping Community ◽

Community Detection Algorithm ◽

The Cost

AbstractBoth community detection and influence maximization are well-researched fields of network science. Here, we investigate how several popular community detection algorithms can be used as part of a heuristic approach to influence maximization. The heuristic is based on the community value, a node-based metric defined on the outputs of overlapping community detection algorithms. This metric is used to select nodes as high influence candidates for expanding the set of influential nodes. Our aim in this paper is twofold. First, we evaluate the performance of eight frequently used overlapping community detection algorithms on this specific task to show how much improvement can be gained compared to the originally proposed method of Kempe et al. Second, selecting the community detection algorithm(s) with the best performance, we propose a variant of the influence maximization heuristic with significantly reduced runtime, at the cost of slightly reduced quality of the output. We use both artificial benchmarks and real-life networks to evaluate the performance of our approach.

Download Full-text

Efficient Estimation of Network Games of Incomplete Information: Application to Large Online Social Networks

Management Science ◽

10.1287/mnsc.2020.3885 ◽

2021 ◽

Author(s):

Xi Chen ◽

Ralf van der Lans ◽

Michael Trusov

Keyword(s):

Social Networks ◽

Social Influence ◽

Incomplete Information ◽

Community Detection ◽

Online Social Networks ◽

Choice Model ◽

Real Life ◽

Efficient Estimation ◽

Data Set ◽

Detection Algorithms

This paper presents a structural discrete choice model with social influence for large-scale social networks. The model is based on an incomplete information game and permits individual-specific parameters of consumers. It is challenging to apply this type of models to real-life scenarios for two reasons: (1) The computation of the Bayesian–Nash equilibrium is highly demanding; and (2) the identification of social influence requires the use of excluded variables that are oftentimes unavailable. To address these challenges, we derive the unique equilibrium conditions of the game, which allow us to employ a stochastic Bayesian estimation procedure that is scalable to large social networks. To facilitate the identification, we utilize community-detection algorithms to divide the network into different groups that, in turn, can be used to construct excluded variables. We validate the proposed structural model with the login decisions of more than 25,000 users of an online social game. Importantly, this data set also contains promotions that were exogenously determined and targeted to only a subgroup of consumers. This information allows us to perform exogeneity tests to validate our identification strategy using community-detection algorithms. Finally, we demonstrate the managerial usefulness of the proposed methodology for improving the strategies of targeting influential consumers in large social networks. This paper was accepted by Matthew Shum, marketing.

Download Full-text

When Less Is More: Systematic Analysis of Cascade-Based Community Detection

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3494563 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-22

Author(s):

Liudmila Prokhorenkova ◽

Alexey Tikhonov ◽

Nelly Litvak

Keyword(s):

Community Detection ◽

Information Diffusion ◽

Real Life ◽

Viral Marketing ◽

Systematic Analysis ◽

Less Is More ◽

Detection Algorithms ◽

Underlying Network ◽

Stable Performance ◽

High Level

Information diffusion, spreading of infectious diseases, and spreading of rumors are fundamental processes occurring in real-life networks. In many practical cases, one can observe when nodes become infected, but the underlying network, over which a contagion or information propagates, is hidden. Inferring properties of the underlying network is important since these properties can be used for constraining infections, forecasting, viral marketing, and so on. Moreover, for many applications, it is sufficient to recover only coarse high-level properties of this network rather than all its edges. This article conducts a systematic and extensive analysis of the following problem: Given only the infection times, find communities of highly interconnected nodes. This task significantly differs from the well-studied community detection problem since we do not observe a graph to be clustered. We carry out a thorough comparison between existing and new approaches on several large datasets and cover methodological challenges specific to this problem. One of the main conclusions is that the most stable performance and the most significant improvement on the current state-of-the-art are achieved by our proposed simple heuristic approaches agnostic to a particular graph structure and epidemic model. We also show that some well-known community detection algorithms can be enhanced by including edge weights based on the cascade data.

Download Full-text

Constructing Real-Life Benchmarks for Community Detection by Rewiring Edges

Complexity ◽

10.1155/2020/7096230 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Jing Xiao ◽

Hong-Fei Ren ◽

Xiao-Ke Xu

Keyword(s):

Performance Evaluation ◽

Community Detection ◽

Multiple Scales ◽

Special Functions ◽

Structural Characteristics ◽

Real Life ◽

Original Network ◽

Community Structures ◽

Functional Characteristics ◽

Detection Algorithms

In order to make the performance evaluation of community detection algorithms more accurate and deepen our analysis of community structures and functional characteristics of real-life networks, a new benchmark constructing method is designed from the perspective of directly rewiring edges in a real-life network instead of building a model. Based on the method, two kinds of novel benchmarks with special functions are proposed. The first kind can accurately approximate the microscale and mesoscale structural characteristics of the original network, providing ideal proxies for real-life networks and helping to realize performance analysis of community detection algorithms when a real network varies characteristics at multiple scales. The second kind is able to independently vary the community intensity in each generated benchmark and make the robustness evaluation of community detection algorithms more accurate. Experimental results prove the effectiveness and superiority of our proposed method. It enables more real-life networks to be used to construct benchmarks and helps to deepen our analysis of community structures and functional characteristics of real-life networks.

Download Full-text

Embedding-based Silhouette community detection

Machine Learning ◽

10.1007/s10994-020-05882-8 ◽

2020 ◽

Vol 109 (11) ◽

pp. 2161-2193 ◽

Cited By ~ 1

Author(s):

Blaž Škrlj ◽

Jan Kralj ◽

Nada Lavrač

Keyword(s):

Community Detection ◽

Real Life ◽

Interaction Network ◽

Subgroup Discovery ◽

Complex Data ◽

Detection Algorithms ◽

Scientific Disciplines ◽

Network Communities ◽

Art Community ◽

Node Embeddings

AbstractMining complex data in the form of networks is of increasing interest in many scientific disciplines. Network communities correspond to densely connected subnetworks, and often represent key functional parts of real-world systems. This paper proposes the embedding-based Silhouette community detection (SCD), an approach for detecting communities, based on clustering of network node embeddings, i.e. real valued representations of nodes derived from their neighborhoods. We investigate the performance of the proposed SCD approach on 234 synthetic networks, as well as on a real-life social network. Even though SCD is not based on any form of modularity optimization, it performs comparably or better than state-of-the-art community detection algorithms, such as the InfoMap and Louvain. Further, we demonstrate that SCD’s outputs can be used along with domain ontologies in semantic subgroup discovery, yielding human-understandable explanations of communities detected in a real-life protein interaction network. Being embedding-based, SCD is widely applicable and can be tested out-of-the-box as part of many existing network learning and exploration pipelines.

Download Full-text

Text Semantic Annotation: A Distributed Methodology Based on Community Coherence

Algorithms ◽

10.3390/a13070160 ◽

2020 ◽

Vol 13 (7) ◽

pp. 160

Author(s):

Christos Makris ◽

Georgios Pispirigos ◽

Michael Angelos Simos

Keyword(s):

Social Sciences ◽

Community Detection ◽

Chemical Engineering ◽

Semantic Annotation ◽

Digital Marketing ◽

Bag Of Words ◽

Detection Algorithms ◽

Text Annotation ◽

Knowledge Based ◽

Scientific Fields

Text annotation is the process of identifying the sense of a textual segment within a given context to a corresponding entity on a concept ontology. As the bag of words paradigm’s limitations become increasingly discernible in modern applications, several information retrieval and artificial intelligence tasks are shifting to semantic representations for addressing the inherent natural language polysemy and homonymy challenges. With extensive application in a broad range of scientific fields, such as digital marketing, bioinformatics, chemical engineering, neuroscience, and social sciences, community detection has attracted great scientific interest. Focusing on linguistics, by aiming to identify groups of densely interconnected subgroups of semantic ontologies, community detection application has proven beneficial in terms of disambiguation improvement and ontology enhancement. In this paper we introduce a novel distributed supervised knowledge-based methodology employing community detection algorithms for text annotation with Wikipedia Entities, establishing the unprecedented concept of community Coherence as a metric for local contextual coherence compatibility. Our experimental evaluation revealed that deeper inference of relatedness and local entity community coherence in the Wikipedia graph bears substantial improvements overall via a focus on accuracy amelioration of less common annotations. The proposed methodology is propitious for wider adoption, attaining robust disambiguation performance.

Download Full-text

Community Detection in Multiplex Networks

ACM Computing Surveys ◽

10.1145/3444688 ◽

2021 ◽

Vol 54 (3) ◽

pp. 1-35

Author(s):

Matteo Magnani ◽

Obaida Hanteer ◽

Roberto Interdonato ◽

Luca Rossi ◽

Andrea Tagarelli

Keyword(s):

Community Detection ◽

Experimental Evaluation ◽

Network Models ◽

Ground Truth ◽

Community Structures ◽

Multiplex Networks ◽

Detection Algorithms ◽

Multiplex Network ◽

The Right ◽

Modes Of Interaction

A multiplex network models different modes of interaction among same-type entities. In this article, we provide a taxonomy of community detection algorithms in multiplex networks. We characterize the different algorithms based on various properties and we discuss the type of communities detected by each method. We then provide an extensive experimental evaluation of the reviewed methods to answer three main questions: to what extent the evaluated methods are able to detect ground-truth communities, to what extent different methods produce similar community structures, and to what extent the evaluated methods are scalable. One goal of this survey is to help scholars and practitioners to choose the right methods for the data and the task at hand, while also emphasizing when such choice is problematic.

Download Full-text

Deep autoencoder-based community detection in complex networks with particle swarm optimization and continuation algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201342 ◽

2021 ◽

pp. 1-17

Author(s):

Mohammed Al-Andoli ◽

Wooi Ping Cheah ◽

Shing Chiang Tan

Keyword(s):

Particle Swarm Optimization ◽

Complex Networks ◽

Learning Community ◽

Community Detection ◽

Particle Swarm ◽

Premature Convergence ◽

Swarm Optimization ◽

Detection Algorithms ◽

Real World Datasets ◽

The Cost

Detecting communities is an important multidisciplinary research discipline and is considered vital to understand the structure of complex networks. Deep autoencoders have been successfully proposed to solve the problem of community detection. However, existing models in the literature are trained based on gradient descent optimization with the backpropagation algorithm, which is known to converge to local minima and prove inefficient, especially in big data scenarios. To tackle these drawbacks, this work proposed a novel deep autoencoder with Particle Swarm Optimization (PSO) and continuation algorithms to reveal community structures in complex networks. The PSO and continuation algorithms were utilized to avoid the local minimum and premature convergence, and to reduce overall training execution time. Two objective functions were also employed in the proposed model: minimizing the cost function of the autoencoder, and maximizing the modularity function, which refers to the quality of the detected communities. This work also proposed other methods to work in the absence of continuation, and to enable premature convergence. Extensive empirical experiments on 11 publically-available real-world datasets demonstrated that the proposed method is effective and promising for deriving communities in complex networks, as well as outperforming state-of-the-art deep learning community detection algorithms.

Download Full-text