Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel

Zheng Yu; Xuhui Fan; Marcin Pietrasik; Marek Z. Reformat

doi:10.1609/aaai.v34i04.6148

Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6148 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6704-6711

Author(s):

Zheng Yu ◽

Xuhui Fan ◽

Marcin Pietrasik ◽

Marek Z. Reformat

Keyword(s):

Prior Information ◽

Structural Information ◽

Real World Data ◽

Posterior Inference ◽

Current Formulation ◽

Proposed Model ◽

Community Evolution ◽

Stochastic Blockmodel ◽

Community Information ◽

Group Information

The Mixed-Membership Stochastic Blockmodel (MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information (e.g. entities' community structural information) can not be well embedded in the modelling; (2), community evolution can not be well described in the literature. Therefore, we propose a non-parametric fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs entity-based clustering to capture the community information for entities and linkage-based clustering to derive the group information for links simultaneously. Besides, the proposed model infers the network structure and models community evolution, manifested by appearances and disappearances of communities, using the discrete fragmentation coagulation process (DFCP). By integrating the community structure with the group compatibility matrix we derive a generalized version of MMSB. An efficient Gibbs sampling scheme with Polya Gamma (PG) approach is implemented for posterior inference. We validate our model on synthetic and real world data.

Download Full-text

Network Embedding on Hierarchical Community Structure Network

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3434747 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-23

Author(s):

Guojie Song ◽

Yun Wang ◽

Lun Du ◽

Yi Li ◽

Junshan Wang

Keyword(s):

Community Structure ◽

Structural Information ◽

Spherical Surface ◽

Network Embedding ◽

The Galaxy ◽

Community Information ◽

The Hierarchical Structure ◽

Network Properties ◽

Multi Class Classification ◽

Low Dimensional

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low-dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification, network visualization, and link prediction. The source code of GNE is available online.

Download Full-text

A Scalable Redefined Stochastic Blockmodel

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442589 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-28

Author(s):

Xueyan Liu ◽

Bo Yang ◽

Hechang Chen ◽

Katarzyna Musial ◽

Hongxu Chen ◽

...

Keyword(s):

Large Scale ◽

Network Science ◽

Learning Algorithm ◽

State Of The Art ◽

Real World Data ◽

Computational Overhead ◽

Stochastic Blockmodel ◽

Np Hard Problem ◽

Large Scale Networks ◽

The Cost

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1

Download Full-text

Phaseless Gauss-Newton Inversion for Microwave Imaging

10.36227/techrxiv.11831295.v1 ◽

2020 ◽

Author(s):

Chaitanya Narendra ◽

Puyan Mojabi

Keyword(s):

Field Data ◽

Prior Information ◽

Structural Information ◽

Microwave Imaging ◽

The Other ◽

Total Field ◽

Field Phase ◽

Full Data ◽

L2 Norm ◽

Phase Data

<p>A phaseless Gauss-Newton inversion (GNI) algorithm is developed for microwave imaging applications. In contrast to full-data microwave imaging inversion that uses complex (magnitude and phase) scattered field data, the proposed phaseless GNI algorithm inverts phaseless (magnitude-only) total field data. This phaseless Gauss-Newton inversion (PGNI) algorithm is augmented with three different forms of regularization, originally developed for complex GNI. First, we use the standard weighted L2 norm total variation multiplicative regularizer which is appropriate when there is no prior information about the object being imaged. We then use two other forms of regularization operators to incorporate prior information about the object being imaged into the PGNI algorithm. The first one, herein referred to as SL-PGNI, incorporates prior information about the expected relative complex permittivity values of the object of interest. The other, referred to as SP-PGNI, incorporates spatial priors (structural information) about the objects being imaged. The use of prior information aims to compensate for the lack of total field phase data. The PGNI, SL-PGNI, and SP-PGNI inversion algorithms are then tested against synthetic and experimental phaseless total field data.</p>

Download Full-text

Integrative pharmacogenomics to infer large-scale drug taxonomy

10.1101/046219 ◽

2016 ◽

Cited By ~ 2

Author(s):

Nehme El-Hachem ◽

Deena M.A. Gendoo ◽

Laleh Soltan Ghoraie ◽

Zhaleh Safikhani ◽

Petr Smirnov ◽

...

Keyword(s):

Drug Targets ◽

Large Scale ◽

Prior Information ◽

Structural Information ◽

Drug Efficacy ◽

Basic Drug ◽

Flexible Tool ◽

Experimental Drugs ◽

New Compounds ◽

Novel Drug

ABSTRACTIdentification of drug targets and mechanism of action (MoA) for new and uncharacterlzed drugs is important for optimization of drug efficacy. Current MoA prediction approaches largely rely on prior information including side effects, therapeutic indication and/or chemo-informatics. Such information is not transferable or applicable for newly identified, previously uncharacterlzed small molecules. Therefore, a shift in the paradigm of MoA predictions is necessary towards development of unbiased approaches that can elucidate drug relationships and efficiently classify new compounds with basic input data. We propose a new integrative computational pharmacogenomlc approach, referred to as Drug Network Fusion (DNF), to infer scalable drug taxonomies that relies only on basic drug characteristics towards elucidating drug-drug relationships. DNF is the first framework to integrate drug structural information, high-throughput drug perturbation and drug sensitivity profiles, enabling drug classification of new experimental compounds with minimal prior information. We demonstrate that the DNF taxonomy succeeds in identifying pertinent and novel drug-drug relationships, making it suitable for investigating experimental drugs with potential new targets or MoA. We highlight how the scalability of DNF facilitates identification of key drug relationships across different drug categories, and poses as a flexible tool for potential clinical applications in precision medicine. Our results support DNF as a valuable resource to the cancer research community by providing new hypotheses on the compound MoA and potential insights for drug repurposlng.

Download Full-text

Biogc: A novel framework for biological network classification via machine learning

Intelligent Data Analysis ◽

10.3233/ida-205240 ◽

2021 ◽

Vol 25 (5) ◽

pp. 1153-1168

Author(s):

Bentian Li ◽

Dechang Pi ◽

Yunxia Lin ◽

Izhar Ahmed Khan

Keyword(s):

Large Scale ◽

Biological Network ◽

Structural Information ◽

Kernel Method ◽

Small Scale ◽

Network Data ◽

Graph Classification ◽

Accuracy Rate ◽

Proposed Model ◽

Complex Structural

Biological network classification is an eminently challenging task in the domain of data mining since the networks contain complex structural information. Conventional biochemical experimental methods and the existing intelligent algorithms still suffer from some limitations such as immense experimental cost and inferior accuracy rate. To solve these problems, in this paper, we propose a novel framework for Biological graph classification named Biogc, which is specifically developed to predict the label of both small-scale and large-scale biological network data flexibly and efficiently. Our framework firstly presents a simplified graph kernel method to capture the structural information of each graph. Then, the obtained informative features are adopted to train different scale biological network data-oriented classifiers to construct the prediction model. Extensive experiments on five benchmark biological network datasets on graph classification task show that the proposed model Biogc outperforms the state-of-the-art methods with an accuracy rate of 98.90% on a larger dataset and 99.32% on a smaller dataset.

Download Full-text

An Attention-Based Graph Neural Network for Heterogeneous Structural Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5833 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4132-4139

Author(s):

Huiting Hong ◽

Hantao Guo ◽

Yucheng Lin ◽

Xiaoqing Yang ◽

Zang Li ◽

...

Keyword(s):

Neural Network ◽

Structural Information ◽

Representation Learning ◽

Graph Representation ◽

Heterogeneous Information ◽

Domain Experts ◽

Proposed Model ◽

Meta Path ◽

Low Dimensional ◽

Public Datasets

In this paper, we focus on graph representation learning of heterogeneous information network (HIN), in which various types of vertices are connected by various types of relations. Most of the existing methods conducted on HIN revise homogeneous graph embedding models via meta-paths to learn low-dimensional vector space of HIN. In this paper, we propose a novel Heterogeneous Graph Structural Attention Neural Network (HetSANN) to directly encode structural information of HIN without meta-path and achieve more informative representations. With this method, domain experts will not be needed to design meta-path schemes and the heterogeneous information can be processed automatically by our proposed model. Specifically, we implicitly represent heterogeneous information using the following two methods: 1) we model the transformation between heterogeneous vertices through a projection in low-dimensional entity spaces; 2) afterwards, we apply the graph neural network to aggregate multi-relational information of projected neighborhood by means of attention mechanism. We also present three extensions of HetSANN, i.e., voices-sharing product attention for the pairwise relationships in HIN, cycle-consistency loss to retain the transformation between heterogeneous entity spaces, and multi-task learning with full use of information. The experiments conducted on three public datasets demonstrate that our proposed models achieve significant and consistent improvements compared to state-of-the-art solutions.

Download Full-text

Atrial Fibrillation Detection Directly from Compressed ECG with the Prior of Measurement Matrix

Information ◽

10.3390/info11090436 ◽

2020 ◽

Vol 11 (9) ◽

pp. 436

Author(s):

Yunfei Cheng ◽

Ying Hu ◽

Mengshu Hou ◽

Tongjie Pan ◽

Wenwen He ◽

...

Keyword(s):

Atrial Fibrillation ◽

Compressed Sensing ◽

Health Monitoring ◽

Prior Information ◽

Classification Performance ◽

Experimental Results ◽

Measurement Matrix ◽

Ecg Signals ◽

Proposed Model ◽

Atrial Fibrillation Detection

In the wearable health monitoring based on compressed sensing, atrial fibrillation detection directly from the compressed ECG can effectively reduce the time cost of data processing rather than classification after reconstruction. However, the existing methods for atrial fibrillation detection from compressed ECG did not fully benefit from the existing prior information, resulting in unsatisfactory classification performance, especially in some applications that require high compression ratio (CR). In this paper, we propose a deep learning method to detect atrial fibrillation directly from compressed ECG without reconstruction. Specifically, we design a deep network model for one-dimensional ECG signals, and the measurement matrix is used to initialize the first layer of the model so that the proposed model can obtain more prior information which benefits improving the classification performance of atrial fibrillation detection from compressed ECG. The experimental results on the MIT-BIH Atrial Fibrillation Database show that when the CR is 10%, the accuracy and F1 score of the proposed method reach 97.52% and 98.02%, respectively. Compared with the atrial fibrillation detection from original ECG, the corresponding accuracy and F1 score are only reduced by 0.88% and 0.69%. Even at a high CR of 90%, the accuracy and F1 score are still only reduced by 6.77% and 5.31%, respectively. All of the experimental results demonstrate that the proposed method is superior to other existing methods for atrial fibrillation detection from compressed ECG. Therefore, the proposed method is promising for atrial fibrillation detection in wearable health monitoring based on compressed sensing.

Download Full-text

Fractional Dynamics in Soccer Leagues

Symmetry ◽

10.3390/sym12030356 ◽

2020 ◽

Vol 12 (3) ◽

pp. 356 ◽

Cited By ~ 2

Author(s):

António M. Lopes ◽

Jose A. Tenreiro Machado

Keyword(s):

Mathematical Modeling ◽

Fractional Calculus ◽

Real World ◽

Complex Dynamics ◽

Statistical Information ◽

Fractional Dynamics ◽

Entropy Analysis ◽

Real World Data ◽

Proposed Model ◽

Soccer Teams

This paper addresses the dynamics of four European soccer teams over the season 2018–2019. The modeling perspective adopts the concepts of fractional calculus and power law. The proposed model embeds implicitly details such as the behavior of players and coaches, strategical and tactical maneuvers during the matches, errors of referees and a multitude of other effects. The scale of observation focuses the teams’ behavior at each round. Two approaches are considered, namely the evaluation of the team progress along the league by a variety of heuristic models fitting real-world data, and the analysis of statistical information by means of entropy. The best models are also adopted for predicting the future results and their performance compared with the real outcome. The computational and mathematical modeling lead to results that are analyzed and interpreted in the light of fractional dynamics. The emergence of patterns both with the heuristic modeling and the entropy analysis highlight similarities in different national leagues and point towards some underlying complex dynamics.

Download Full-text

Optimizing the Borrowing Limit and Interest Rate in P2P System: From Borrowers’ Perspective

Scientific Programming ◽

10.1155/2018/2613739 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14

Author(s):

Zhihong Li ◽

Lanteng Wu ◽

Hongting Tang

Keyword(s):

Neural Network ◽

Interest Rate ◽

Bp Neural Network ◽

Interval Estimation ◽

Real World Data ◽

Proposed Model ◽

P2p Lending ◽

New Perspective ◽

P2p System ◽

Benefits And Costs

P2P (peer-to-peer) lending is an emerging online service that allows individuals to borrow money from unrelated person without the intervention of traditional financial intermediaries. In these platforms, borrowing limit and interest rate are two of the most notable elements for borrowers, which directly influence their borrowing benefits and costs, respectively. To that end, this paper introduces a BP neural network interval estimation (BPIE) algorithm to predict the borrowers’ borrowing limit and interest rate based on their characteristics and simultaneously develops a new parameter optimization algorithm (GBPO) based on the genetic algorithm and our BP neural network predictive model to optimize them. Using real-world data from http://ppdai.com, the experimental results show that our proposed model achieves a good performance. This research provides a new perspective from borrowers in exploring the P2P lending. The case base and proposed knowledge are the two contributions for FinTech research.

Download Full-text

Bayesian modelling of compositional heterogeneity in molecular phylogenetics

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2013-0077 ◽

2014 ◽

Vol 13 (5) ◽

Cited By ~ 9

Author(s):

Sarah E. Heaps ◽

Tom M.W. Nye ◽

Richard J. Boys ◽

Tom A. Williams ◽

T. Martin Embley

Keyword(s):

Molecular Phylogenetics ◽

Data Augmentation ◽

Likelihood Function ◽

Sequence Evolution ◽

Compositional Heterogeneity ◽

Sequence Composition ◽

Posterior Inference ◽

Proposed Model ◽

Root Position ◽

Biological Interpretation

AbstractIn molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.

Download Full-text