Large scale graph mining for web reputation inference

Author(s):  
Yonghong Huang ◽  
Paula Greve
Keyword(s):  
Author(s):  
Kai Zheng ◽  
Zhu-Hong You ◽  
Lei Wang ◽  
Leon Wong ◽  
Zhan-Heng Chen ◽  
...  

ABSTRACTMotivationPIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlates with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes out gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. This study aims to make preliminary attempts on bionetwork based graph mining.ResultsIn this study, we present a method based on graph attention network to identify potential and biologically significant piRNA-disease associations (PDAs), called GAPDA. The attention mechanism can calculate a hidden representation of an association in the network based on neighbor nodes and assign weights to the input to make decisions. In particular, we introduced the attention-based Graph Neural Networks to the field of bio-association prediction for the first time, and proposed an abstract network topology suitable for small samples. Specifically, we combined piRNA sequence information and disease semantic similarity with piRNA-disease association network to construct a new attribute network. In the experiment, GAPDA performed excellently in five-fold cross-validation with the AUC of 0.9038. Not only that, but it still has superior performance compared to methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical [email protected];[email protected] informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Vol 14 (13) ◽  
pp. 3416-3416
Author(s):  
Danai Koutra

Our ability to generate, collect, and archive data related to everyday activities, such as interacting on social media, browsing the web, and monitoring well-being, is rapidly increasing. Getting the most benefit from this large-scale data requires analysis of patterns it contains, which is computationally intensive or even intractable. Summarization techniques produce compact data representations (summaries) that enable faster processing by complex algorithms and queries. This talk will cover summarization of interconnected data (graphs) [3], which can represent a variety of natural processes (e.g., friendships, communication). I will present an overview of my group's work on bridging the gap between research on summarized network representations and real-world problems. Examples include summarization of massive knowledge graphs for refinement [2] and on-device querying [4], summarization of graph streams for persistent activity detection [1], and summarization within graph neural networks for fast, interpretable classification [5]. I will conclude with open challenges and opportunities for future research.


Author(s):  
Charalampos E. Tsourakakis

In this Chapter, we present state of the art work on large scale graph mining using MapReduce. We survey research work on an important graph mining problem, counting the number of triangles in large-real world networks. We present the most important applications related to the count of triangles and two families of algorithms, a spectral and a combinatorial one, which solve the problem efficiently.


Author(s):  
Hongzhi Chen ◽  
Xiaoxi Wang ◽  
Chenghuan Huang ◽  
Juncheng Fang ◽  
Yifan Hou ◽  
...  
Keyword(s):  

2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Francesco Tudisco ◽  
Desmond J. Higham

Abstract Many graph mining tasks can be viewed as classification problems on high dimensional data. Within this class we consider the issue of discovering core-periphery structure, which has wide applications in the economic and social sciences. In contrast to many current approaches, we allow for weighted and directed edges and we do not assume that the overall network is connected. Our approach extends recent work on a relevant relaxed nonlinear optimization problem. In the directed, weighted setting, we derive and analyze a globally convergent iterative algorithm. We also relate the algorithm to a maximum likelihood reordering problem on an appropriate core-periphery random graph model. We illustrate the effectiveness of the new algorithm on a large scale directed email network.


Sign in / Sign up

Export Citation Format

Share Document