When Less Is More: Systematic Analysis of Cascade-Based Community Detection

Liudmila Prokhorenkova; Alexey Tikhonov; Nelly Litvak

doi:10.1145/3494563

When Less Is More: Systematic Analysis of Cascade-Based Community Detection

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3494563 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-22

Author(s):

Liudmila Prokhorenkova ◽

Alexey Tikhonov ◽

Nelly Litvak

Keyword(s):

Community Detection ◽

Information Diffusion ◽

Real Life ◽

Viral Marketing ◽

Systematic Analysis ◽

Less Is More ◽

Detection Algorithms ◽

Underlying Network ◽

Stable Performance ◽

High Level

Information diffusion, spreading of infectious diseases, and spreading of rumors are fundamental processes occurring in real-life networks. In many practical cases, one can observe when nodes become infected, but the underlying network, over which a contagion or information propagates, is hidden. Inferring properties of the underlying network is important since these properties can be used for constraining infections, forecasting, viral marketing, and so on. Moreover, for many applications, it is sufficient to recover only coarse high-level properties of this network rather than all its edges. This article conducts a systematic and extensive analysis of the following problem: Given only the infection times, find communities of highly interconnected nodes. This task significantly differs from the well-studied community detection problem since we do not observe a graph to be clustered. We carry out a thorough comparison between existing and new approaches on several large datasets and cover methodological challenges specific to this problem. One of the main conclusions is that the most stable performance and the most significant improvement on the current state-of-the-art are achieved by our proposed simple heuristic approaches agnostic to a particular graph structure and epidemic model. We also show that some well-known community detection algorithms can be enhanced by including edge weights based on the cascade data.

Download Full-text

Recent trends on community detection algorithms: A survey

Modern Physics Letters B ◽

10.1142/s0217984920504084 ◽

2020 ◽

Vol 34 (35) ◽

pp. 2050408

Author(s):

Sumit Gupta ◽

Dhirendra Pratap Singh

Keyword(s):

Community Detection ◽

Real Life ◽

Detection Algorithm ◽

Data Sets ◽

Detection Algorithms ◽

Life Problems ◽

Application Data ◽

Art Community ◽

Community Detection Algorithm ◽

Two Parameters

In today’s world scenario, many of the real-life problems and application data can be represented with the help of the graphs. Nowadays technology grows day by day at a very fast rate; applications generate a vast amount of valuable data, due to which the size of their representation graphs is increased. How to get meaningful information from these data become a hot research topic. Methodical algorithms are required to extract useful information from these raw data. These unstructured graphs are not scattered in nature, but these show some relationships between their basic entities. Identifying communities based on these relationships improves the understanding of the applications represented by graphs. Community detection algorithms are one of the solutions which divide the graph into small size clusters where nodes are densely connected within the cluster and sparsely connected across. During the last decade, there are lots of algorithms proposed which can be categorized into mainly two broad categories; non-overlapping and overlapping community detection algorithm. The goal of this paper is to offer a comparative analysis of the various community detection algorithms. We bring together all the state of art community detection algorithms related to these two classes into a single article with their accessible benchmark data sets. Finally, we represent a comparison of these algorithms concerning two parameters: one is time efficiency, and the other is how accurately the communities are detected.

Download Full-text

Evaluating the role of community detection in improving influence maximization heuristics

Social Network Analysis and Mining ◽

10.1007/s13278-021-00804-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

László Hajdu ◽

Miklós Krész ◽

András Bóta

Keyword(s):

Community Detection ◽

Real Life ◽

Detection Algorithm ◽

Influence Maximization ◽

Overlapping Community Detection ◽

Detection Algorithms ◽

Overlapping Community ◽

Community Detection Algorithm ◽

The Cost

AbstractBoth community detection and influence maximization are well-researched fields of network science. Here, we investigate how several popular community detection algorithms can be used as part of a heuristic approach to influence maximization. The heuristic is based on the community value, a node-based metric defined on the outputs of overlapping community detection algorithms. This metric is used to select nodes as high influence candidates for expanding the set of influential nodes. Our aim in this paper is twofold. First, we evaluate the performance of eight frequently used overlapping community detection algorithms on this specific task to show how much improvement can be gained compared to the originally proposed method of Kempe et al. Second, selecting the community detection algorithm(s) with the best performance, we propose a variant of the influence maximization heuristic with significantly reduced runtime, at the cost of slightly reduced quality of the output. We use both artificial benchmarks and real-life networks to evaluate the performance of our approach.

Download Full-text

A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks

Information ◽

10.3390/info11040199 ◽

2020 ◽

Vol 11 (4) ◽

pp. 199 ◽

Cited By ~ 3

Author(s):

Christos Makris ◽

Georgios Pispirigos ◽

Ioannis Orestis Rizos

Keyword(s):

Community Detection ◽

Real Life ◽

Ensemble Methods ◽

Detection Algorithms ◽

Edge Betweenness ◽

Social Graphs ◽

Published Research ◽

Vertex Centrality ◽

Scientific Fields ◽

Bagging Ensemble

Presently, due to the extended availability of gigantic information networks and the beneficial application of graph analysis in various scientific fields, the necessity for efficient and highly scalable community detection algorithms has never been more essential. Despite the significant amount of published research, the existing methods—such as the Girvan–Newman, random-walk edge betweenness, vertex centrality, InfoMap, spectral clustering, etc.—have virtually been proven incapable of handling real-life social graphs due to the intrinsic computational restrictions that lead to mediocre performance and poor scalability. The purpose of this article is to introduce a novel, distributed community detection methodology which in accordance with the community prediction concept, leverages the reduced complexity and the decreased variance of the bagging ensemble methods, to unveil the subjacent community hierarchy. The proposed approach has been thoroughly tested, meticulously compared against different classic community detection algorithms, and practically proven exceptionally scalable, eminently efficient, and promisingly accurate in unfolding the underlying community structure.

Download Full-text

Efficient Estimation of Network Games of Incomplete Information: Application to Large Online Social Networks

Management Science ◽

10.1287/mnsc.2020.3885 ◽

2021 ◽

Author(s):

Xi Chen ◽

Ralf van der Lans ◽

Michael Trusov

Keyword(s):

Social Networks ◽

Social Influence ◽

Incomplete Information ◽

Community Detection ◽

Online Social Networks ◽

Choice Model ◽

Real Life ◽

Efficient Estimation ◽

Data Set ◽

Detection Algorithms

This paper presents a structural discrete choice model with social influence for large-scale social networks. The model is based on an incomplete information game and permits individual-specific parameters of consumers. It is challenging to apply this type of models to real-life scenarios for two reasons: (1) The computation of the Bayesian–Nash equilibrium is highly demanding; and (2) the identification of social influence requires the use of excluded variables that are oftentimes unavailable. To address these challenges, we derive the unique equilibrium conditions of the game, which allow us to employ a stochastic Bayesian estimation procedure that is scalable to large social networks. To facilitate the identification, we utilize community-detection algorithms to divide the network into different groups that, in turn, can be used to construct excluded variables. We validate the proposed structural model with the login decisions of more than 25,000 users of an online social game. Importantly, this data set also contains promotions that were exogenously determined and targeted to only a subgroup of consumers. This information allows us to perform exogeneity tests to validate our identification strategy using community-detection algorithms. Finally, we demonstrate the managerial usefulness of the proposed methodology for improving the strategies of targeting influential consumers in large social networks. This paper was accepted by Matthew Shum, marketing.

Download Full-text

Constructing Real-Life Benchmarks for Community Detection by Rewiring Edges

Complexity ◽

10.1155/2020/7096230 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Jing Xiao ◽

Hong-Fei Ren ◽

Xiao-Ke Xu

Keyword(s):

Performance Evaluation ◽

Community Detection ◽

Multiple Scales ◽

Special Functions ◽

Structural Characteristics ◽

Real Life ◽

Original Network ◽

Community Structures ◽

Functional Characteristics ◽

Detection Algorithms

In order to make the performance evaluation of community detection algorithms more accurate and deepen our analysis of community structures and functional characteristics of real-life networks, a new benchmark constructing method is designed from the perspective of directly rewiring edges in a real-life network instead of building a model. Based on the method, two kinds of novel benchmarks with special functions are proposed. The first kind can accurately approximate the microscale and mesoscale structural characteristics of the original network, providing ideal proxies for real-life networks and helping to realize performance analysis of community detection algorithms when a real network varies characteristics at multiple scales. The second kind is able to independently vary the community intensity in each generated benchmark and make the robustness evaluation of community detection algorithms more accurate. Experimental results prove the effectiveness and superiority of our proposed method. It enables more real-life networks to be used to construct benchmarks and helps to deepen our analysis of community structures and functional characteristics of real-life networks.

Download Full-text

Embedding-based Silhouette community detection

Machine Learning ◽

10.1007/s10994-020-05882-8 ◽

2020 ◽

Vol 109 (11) ◽

pp. 2161-2193 ◽

Cited By ~ 1

Author(s):

Blaž Škrlj ◽

Jan Kralj ◽

Nada Lavrač

Keyword(s):

Community Detection ◽

Real Life ◽

Interaction Network ◽

Subgroup Discovery ◽

Complex Data ◽

Detection Algorithms ◽

Scientific Disciplines ◽

Network Communities ◽

Art Community ◽

Node Embeddings

AbstractMining complex data in the form of networks is of increasing interest in many scientific disciplines. Network communities correspond to densely connected subnetworks, and often represent key functional parts of real-world systems. This paper proposes the embedding-based Silhouette community detection (SCD), an approach for detecting communities, based on clustering of network node embeddings, i.e. real valued representations of nodes derived from their neighborhoods. We investigate the performance of the proposed SCD approach on 234 synthetic networks, as well as on a real-life social network. Even though SCD is not based on any form of modularity optimization, it performs comparably or better than state-of-the-art community detection algorithms, such as the InfoMap and Louvain. Further, we demonstrate that SCD’s outputs can be used along with domain ontologies in semantic subgroup discovery, yielding human-understandable explanations of communities detected in a real-life protein interaction network. Being embedding-based, SCD is widely applicable and can be tested out-of-the-box as part of many existing network learning and exploration pipelines.

Download Full-text

A Survey of Information Cascade Analysis

ACM Computing Surveys ◽

10.1145/3433000 ◽

2021 ◽

Vol 54 (2) ◽

pp. 1-36

Author(s):

Fan Zhou ◽

Xovee Xu ◽

Goce Trajcevski ◽

Kunpeng Zhang

Keyword(s):

Information Diffusion ◽

Research Work ◽

Viral Marketing ◽

Graph Representation ◽

Digital Information ◽

Information Cascades ◽

Challenges And Opportunities ◽

Scientific Papers ◽

Popularity Prediction ◽

Types Of Information

The deluge of digital information in our daily life—from user-generated content, such as microblogs and scientific papers, to online business, such as viral marketing and advertising—offers unprecedented opportunities to explore and exploit the trajectories and structures of the evolution of information cascades. Abundant research efforts, both academic and industrial, have aimed to reach a better understanding of the mechanisms driving the spread of information and quantifying the outcome of information diffusion. This article presents a comprehensive review and categorization of information popularity prediction methods, from feature engineering and stochastic processes , through graph representation , to deep learning-based approaches . Specifically, we first formally define different types of information cascades and summarize the perspectives of existing studies. We then present a taxonomy that categorizes existing works into the aforementioned three main groups as well as the main subclasses in each group, and we systematically review cutting-edge research work. Finally, we summarize the pros and cons of existing research efforts and outline the open challenges and opportunities in this field.

Download Full-text

Community Detection in Multiplex Networks

ACM Computing Surveys ◽

10.1145/3444688 ◽

2021 ◽

Vol 54 (3) ◽

pp. 1-35

Author(s):

Matteo Magnani ◽

Obaida Hanteer ◽

Roberto Interdonato ◽

Luca Rossi ◽

Andrea Tagarelli

Keyword(s):

Community Detection ◽

Experimental Evaluation ◽

Network Models ◽

Ground Truth ◽

Community Structures ◽

Multiplex Networks ◽

Detection Algorithms ◽

Multiplex Network ◽

The Right ◽

Modes Of Interaction

A multiplex network models different modes of interaction among same-type entities. In this article, we provide a taxonomy of community detection algorithms in multiplex networks. We characterize the different algorithms based on various properties and we discuss the type of communities detected by each method. We then provide an extensive experimental evaluation of the reviewed methods to answer three main questions: to what extent the evaluated methods are able to detect ground-truth communities, to what extent different methods produce similar community structures, and to what extent the evaluated methods are scalable. One goal of this survey is to help scholars and practitioners to choose the right methods for the data and the task at hand, while also emphasizing when such choice is problematic.

Download Full-text

Deep autoencoder-based community detection in complex networks with particle swarm optimization and continuation algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201342 ◽

2021 ◽

pp. 1-17

Author(s):

Mohammed Al-Andoli ◽

Wooi Ping Cheah ◽

Shing Chiang Tan

Keyword(s):

Particle Swarm Optimization ◽

Complex Networks ◽

Learning Community ◽

Community Detection ◽

Particle Swarm ◽

Premature Convergence ◽

Swarm Optimization ◽

Detection Algorithms ◽

Real World Datasets ◽

The Cost

Detecting communities is an important multidisciplinary research discipline and is considered vital to understand the structure of complex networks. Deep autoencoders have been successfully proposed to solve the problem of community detection. However, existing models in the literature are trained based on gradient descent optimization with the backpropagation algorithm, which is known to converge to local minima and prove inefficient, especially in big data scenarios. To tackle these drawbacks, this work proposed a novel deep autoencoder with Particle Swarm Optimization (PSO) and continuation algorithms to reveal community structures in complex networks. The PSO and continuation algorithms were utilized to avoid the local minimum and premature convergence, and to reduce overall training execution time. Two objective functions were also employed in the proposed model: minimizing the cost function of the autoencoder, and maximizing the modularity function, which refers to the quality of the detected communities. This work also proposed other methods to work in the absence of continuation, and to enable premature convergence. Extensive empirical experiments on 11 publically-available real-world datasets demonstrated that the proposed method is effective and promising for deriving communities in complex networks, as well as outperforming state-of-the-art deep learning community detection algorithms.

Download Full-text

High expression of fibroblast activation protein (FAP) predicts poor outcome in high-grade serous ovarian cancer

BMC Cancer ◽

10.1186/s12885-020-07541-6 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Min Li ◽

Xue Cheng ◽

Rong Rong ◽

Yan Gao ◽

Xiuwu Tang ◽

...

Keyword(s):

Ovarian Cancer ◽

Cox Regression ◽

Paraffin Section ◽

Progression Free Survival ◽

High Grade ◽

Serous Ovarian Cancer ◽

Systematic Analysis ◽

Cox Regression Analysis ◽

Genes Expression ◽

High Level

Abstract Background High-grade serous ovarian cancer (HGSOC) is a fatal form of ovarian cancer. Previous studies indicated some potential biomarkers for clinical evaluation of HGSOC prognosis. However, there is a lack of systematic analysis of different expression genes (DEGs) to screen and detect significant biomarkers of HGSOC. Methods TCGA database was conducted to analyze relevant genes expression in HGSOC. Outcomes of candidate genes expression, including overall survival (OS) and progression-free survival (PFS), were calculated by Cox regression analysis for hazard rates (HR). Histopathological investigation of the identified genes was carried out in 151 Chinese HGSOC patients to validate gene expression in different stages of HGSOC. Results Of all 57,331 genes that were analyzed, FAP was identified as the only novel gene that significantly contributed to both OS and PFS of HGSOC. In addition, FAP had a consistent expression profile between carcinoma-paracarcinoma and early-advanced stages of HGSOC. Immunological tests in paraffin section also confirmed that up-regulation of FAP was present in advanced stage HGSOC patients. Prediction of FAP network association suggested that FN1 could be a potential downstream gene which further influenced HGSOC survival. Conclusions High-level expression of FAP was associated with poor prognosis of HGSOC via FN1 pathway.

Download Full-text