ALPINE: Active Link Prediction Using Network Embedding

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.

Download Full-text

JANE: Jointly Adversarial Network Embedding

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/192 ◽

2020 ◽

Author(s):

Liang Yang ◽

Yuexue Wang ◽

Junhua Gu ◽

Chuan Wang ◽

Xiaochun Cao ◽

...

Keyword(s):

Link Prediction ◽

Real Data ◽

Semantic Space ◽

Network Embedding ◽

Generative Adversarial Network ◽

Adversarial Learning ◽

Adversarial Network ◽

Node Clustering ◽

Topology Information ◽

Embedding Methods

Motivated by the capability of Generative Adversarial Network on exploring the latent semantic space and capturing semantic variations in the data distribution, adversarial learning has been adopted in network embedding to improve the robustness. However, this important ability is lost in existing adversarially regularized network embedding methods, because their embedding results are directly compared to the samples drawn from perturbation (Gaussian) distribution without any rectification from real data. To overcome this vital issue, a novel Joint Adversarial Network Embedding (JANE) framework is proposed to jointly distinguish the real and fake combinations of the embeddings, topology information and node features. JANE contains three pluggable components, Embedding module, Generator module and Discriminator module. The overall objective function of JANE is defined in a min-max form, which can be optimized via alternating stochastic gradient. Extensive experiments demonstrate the remarkable superiority of the proposed JANE on link prediction (3% gains in both AUC and AP) and node clustering (5% gain in F1 score).

Download Full-text

Link prediction based on network embedding and similarity transferring methods

Modern Physics Letters B ◽

10.1142/s0217984920501699 ◽

2020 ◽

Vol 34 (16) ◽

pp. 2050169

Author(s):

Wei Yu ◽

Xiaoyu Liu ◽

Bo Ouyang

Keyword(s):

Real World ◽

Link Prediction ◽

Free Parameter ◽

Network Science ◽

Ad Hoc ◽

Prediction Algorithm ◽

Network Embedding ◽

Science Community ◽

The Cost ◽

Accuracy Of Prediction

In network science, link prediction is a technique used to predict missing or future relationships based on currently observed connections. Much attention from the network science community is paid to this direction recently. However, most present approaches predict links based on ad hoc similarity definitions. To address this issue, we propose a link prediction algorithm named Transferring Similarity Based on Adjacency Embedding (TSBAE). TSBAE is based on network embedding, where the potential information of the structure is preserved in the embedded vector space, and the similarity is inherently captured by the distance of these vectors. Furthermore, to accommodate the fact that the similarity should be transferable, indirect similarity between nodes is incorporated to improve the accuracy of prediction. The experimental results on 10 real-world networks show that TSBAE outperforms the baseline algorithms in the task of link prediction, with the cost of tuning a free parameter in the prediction.

Download Full-text

Assumptions of biological measurements: important considerations when evaluating western blot data

10.22541/au.161144035.53972444/v1 ◽

2021 ◽

Author(s):

Maxwell DeNies ◽

Allen Liu ◽

Santiago Schnell

Keyword(s):

Cell Signaling ◽

Western Blotting ◽

Protein Interactions ◽

Data Interpretation ◽

Biological Processes ◽

Protein Protein Interactions ◽

Post Translational Modification ◽

Western Blots ◽

Additional Information ◽

Experimental Variability

As technological and analytical innovations rapidly advance our ability to reveal increasingly complex biological processes, the importance of understanding the assumptions behind biological measurements and sources of uncertainty are essential for data interpretation. This is particularly important in fields such as cell signaling, as due to its importance for both homeostatic and pathogenic biological processes, a quantitative understanding of the basic mechanisms of these transient events is fundamental to drug development. While developed decades ago, western blotting remains an indispensible research tool to probe cell signaling, protein expression, and protein-protein interactions. While improvements in statistical and methodology reporting have improved data quality, understanding the basic experimental assumptions and visual inspection of western blots provides additional information that is useful when evaluating experimental conclusions. Using agonist-induced receptor post-translational modification as an example we highlight the assumptions of western blotting and showcase how clues from raw western blots can hint at experimental variability that is not captured by statistics and methods that influences quantification. The purpose of this article is not to serve as a detailed review of the technical nuances and caveats of western blotting. Instead using an example we illustrate how experimental assumptions, design, and data normalization can be identified in raw data and influence data interpretation.

Download Full-text

61 The impact of selective phenotyping and genotyping over generations in beef cattle

Journal of Animal Science ◽

10.1093/jas/skz122.068 ◽

2019 ◽

Vol 97 (Supplement_2) ◽

pp. 37-39

Author(s):

Andrea Plotzki Reis ◽

Rodrigo Fagundes da Costa ◽

Fabyano Fonseca e Silva ◽

Fernando Flores Cardoso ◽

Matthew L Spangler

Keyword(s):

Beef Cattle ◽

Prediction Accuracy ◽

Cost Benefit ◽

Single Step ◽

Breeding Values ◽

Additional Information ◽

Economic Framework ◽

Estimated Breeding Values ◽

The Cost ◽

The Impact

Abstract The aim of this study was to investigate selective phenotyping to maintain adequate prediction accuracy. A simulation was conducted, with 10 replicates, using QMSim to mimic the structure and size of a Braford population. A population with 50 generations, 500 animals per generation, was created with phenotyping and genotyping beginning in generation 11. The scenarios investigated were: 1) Randomly phenotype and genotype 10, 25, 50, 75, and 100% of individuals each generation and; 2) Randomly phenotype and genotype 10, 25, 50, 75, and 100% of individuals in every-other generation. Estimated breeding values (EBV) were obtained using single-step GBLUP and accuracy was determined as the correlation between true BV from simulation and those estimated from the blupf90 family of programs. For scenarios where phenotyping and genotyping occurred every generation, EBV accuracies in generation 11 and 50 ranged from 0.32 to 0.32, 0.42 to 0.43, 0.49 to 0.51, 0.53 to 0.56 and 0.57 to 0.59 when 10, 25, 50, 75, and 100% of animals were chosen, respectively. The highest accuracies were 0.40 and 0.50 in generation 38 for scenarios 10 and 25%; 0.56, 0.61 and 0.64 in generation 40 for scenarios 50, 75 and 100%, respectively. When animals were selected every-other generation, EBV accuracy in generation 11 and 50 ranged from 0.24 to 0.26, 0.36 to 0.36, 0.43 to 0.42, 0.48 to 0.44 and 0.53 to 0.48 for 10, 25, 50, 75 and 100% of selected animals, respectively. The highest accuracies were in generation 23 for scenario 10% (0.31), in generation 37 for scenarios 25 (0.43), 50 (0.50) and 75% (0.55) and in generation 39 for 100% (0.59). Although increasing the density of phenotyped and genotyped animals increased prediction accuracy, some gains were marginal. These differences in accuracy must be contemplated in an economic framework to determine the cost-benefit of additional information.

Download Full-text

Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model

Protein Science ◽

10.1002/pro.2991 ◽

2016 ◽

Vol 25 (10) ◽

pp. 1825-1833 ◽

Cited By ~ 22

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zhu-Hong You ◽

Xing Chen ◽

Gui-Ying Yan ◽

...

Keyword(s):

Protein Interactions ◽

Prediction Accuracy ◽

Relevance Vector Machine ◽

Evolutionary Information ◽

Protein Protein Interactions ◽

Machine Model

Download Full-text

ACT-SVM: Prediction of Protein-Protein Interactions Based on Support Vector Basis Model

Scientific Programming ◽

10.1155/2020/8866557 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Wenzheng Ma ◽

Yi Cao ◽

Wenzheng Bao ◽

Bin Yang ◽

Yuehui Chen

Keyword(s):

Protein Interactions ◽

Prediction Accuracy ◽

Support Vector ◽

Svm Classifier ◽

Protein Protein Interactions ◽

Svm Model ◽

Novel Method ◽

H Pylori ◽

Almost All ◽

Human Dataset

The interactions between proteins play important roles in several organisms, and such issue can be involved in almost all activities in the cell. The research of protein-protein interactions (PPIs) can make a huge contribution to the prevention and treatment of diseases. Currently, many prediction methods based on machine learning have been proposed to predict PPIs. In this article, we propose a novel method ACT-SVM that can effectively predict PPIs. The ACT-SVM model maps protein sequences to digital features, performs feature extraction twice on the protein sequence to obtain vector A and descriptor CT, and combines them into a vector. Then, the feature vectors of the protein pair are merged as the input of the support vector machine (SVM) classifier. We utilize nonredundant H. pylori and human dataset to verify the prediction performance of our method. Finally, the proposed method has a prediction accuracy of 0.727897 for H. pylori data and a prediction accuracy of 0.838799 for human dataset. The results demonstrate that this method can be called a stable and reliable prediction model of PPIs.

Download Full-text

Contacts-based prediction of binding affinity in protein–protein complexes

eLife ◽

10.7554/elife.07454 ◽

2015 ◽

Vol 4 ◽

Cited By ~ 119

Author(s):

Anna Vangone ◽

Alexandre MJJ Bonvin

Keyword(s):

Binding Affinity ◽

Conformational Changes ◽

Protein Interactions ◽

Prediction Accuracy ◽

Protein Complexes ◽

Structural Features ◽

Specific Protein ◽

Strong Impact ◽

Protein Protein Interactions ◽

Almost All

Almost all critical functions in cells rely on specific protein–protein interactions. Understanding these is therefore crucial in the investigation of biological systems. Despite all past efforts, we still lack a thorough understanding of the energetics of association of proteins. Here, we introduce a new and simple approach to predict binding affinity based on functional and structural features of the biological system, namely the network of interfacial contacts. We assess its performance against a protein–protein binding affinity benchmark and show that both experimental methods used for affinity measurements and conformational changes have a strong impact on prediction accuracy. Using a subset of complexes with reliable experimental binding affinities and combining our contacts and contact-types-based model with recent observations on the role of the non-interacting surface in protein–protein interactions, we reach a high prediction accuracy for such a diverse dataset outperforming all other tested methods.

Download Full-text

Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks

10.1101/499244 ◽

2018 ◽

Author(s):

Cen Wan ◽

Domenico Cozzetto ◽

Rui Fa ◽

David T. Jones

Keyword(s):

Neural Networks ◽

Protein Interaction ◽

Protein Interactions ◽

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Function Prediction ◽

Network Embedding ◽

Protein Protein Interactions ◽

Functional Representations

Protein-protein interaction network data provides valuable information that infers direct links between genes and their biological roles. This information brings a fundamental hypothesis for protein function prediction that interacting proteins tend to have similar functions. With the help of recently-developed network embedding feature generation methods and deep maxout neural networks, it is possible to extract functional representations that encode direct links between protein-protein interactions information and protein function. Our novel method, STRING2GO, successfully adopts deep maxout neural networks to learn functional representations simultaneously encoding both protein-protein interactions and functional predictive information. The experimental results show that STRING2GO outperforms other network embedding-based prediction methods and one benchmark method adopted in a recent large scale protein function prediction competition.

Download Full-text

Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier

Journal of Theoretical Biology ◽

10.1016/j.jtbi.2017.01.003 ◽

2017 ◽

Vol 418 ◽

pp. 105-110 ◽

Cited By ~ 32

Author(s):

Lei Wang ◽

Zhu-Hong You ◽

Shi-Xiong Xia ◽

Feng Liu ◽

Xing Chen ◽

...

Keyword(s):

Protein Interactions ◽

Prediction Accuracy ◽

Ensemble Classifier ◽

Position Specific Scoring Matrix ◽

Evolutionary Information ◽

Protein Protein Interactions ◽

Scoring Matrix

Download Full-text

Road network link prediction model based on subgraph pattern

International Journal of Modern Physics C ◽

10.1142/s0129183120500837 ◽

2020 ◽

Vol 31 (06) ◽

pp. 2050083

Author(s):

Bin Wang ◽

Xiaoxia Pan ◽

Yilei Li ◽

Jinfang Sheng ◽

Jun Long ◽

...

Keyword(s):

Prediction Model ◽

Link Prediction ◽

Prediction Accuracy ◽

Road Network ◽

Network Embedding ◽

Urban Road Network ◽

Urban Road ◽

The Road ◽

Network Link ◽

Subgraph Pattern

Urban road network (referred to as the road network) is a complex and highly sparse network. Link prediction of the urban road network can reasonably predict urban structural changes and assist urban designers in decision-making. In this paper, a new link prediction model ASFC is proposed for the characteristics of the road network. The model first performs network embedding on the road network through road2vec algorithm, and then organically combines the subgraph pattern with the network embedding results and the Katz index together, and then we construct the all-order subgraph feature that includes low-order, medium-order and high-order subgraph features and finally to train the logistic regression classification model for road network link prediction. The experiment compares the performance of the ASFC model and other link prediction models in different countries and different types of urban road networks and the influence of changes in model parameters on prediction accuracy. The results show that ASFC performs well in terms of prediction accuracy and stability.

Download Full-text