Clustering Large Collection of Biomedical Literature Based on Ontology-Enriched Bipartite Graph Representation and Mutual Refinement Strategy

Author(s):  
Illhoi Yoo ◽  
Xiaohua Hu
Author(s):  
Jie Cheng ◽  
Lu Lian ◽  
Zichen Xu ◽  
Dan Wu ◽  
Haoyang Zhu ◽  
...  

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 3
Author(s):  
Giacomo Frisoni ◽  
Gianluca Moro ◽  
Giulio Carlassare ◽  
Antonella Carbonaro

The automatic extraction of biomedical events from the scientific literature has drawn keen interest in the last several years, recognizing complex and semantically rich graphical interactions otherwise buried in texts. However, very few works revolve around learning embeddings or similarity metrics for event graphs. This gap leaves biological relations unlinked and prevents the application of machine learning techniques to promote discoveries. Taking advantage of recent deep graph kernel solutions and pre-trained language models, we propose Deep Divergence Event Graph Kernels (DDEGK), an unsupervised inductive method to map events into low-dimensional vectors, preserving their structural and semantic similarities. Unlike most other systems, DDEGK operates at a graph level and does not require task-specific labels, feature engineering, or known correspondences between nodes. To this end, our solution compares events against a small set of anchor ones, trains cross-graph attention networks for drawing pairwise alignments (bolstering interpretability), and employs transformer-based models to encode continuous attributes. Extensive experiments have been done on nine biomedical datasets. We show that our learned event representations can be effectively employed in tasks such as graph classification, clustering, and visualization, also facilitating downstream semantic textual similarity. Empirical results demonstrate that DDEGK significantly outperforms other state-of-the-art methods.


2021 ◽  
Vol 4 ◽  
Author(s):  
David Gordon ◽  
Panayiotis Petousis ◽  
Henry Zheng ◽  
Davina Zamanzadeh ◽  
Alex A.T. Bui

We present a novel approach for imputing missing data that incorporates temporal information into bipartite graphs through an extension of graph representation learning. Missing data is abundant in several domains, particularly when observations are made over time. Most imputation methods make strong assumptions about the distribution of the data. While novel methods may relax some assumptions, they may not consider temporality. Moreover, when such methods are extended to handle time, they may not generalize without retraining. We propose using a joint bipartite graph approach to incorporate temporal sequence information. Specifically, the observation nodes and edges with temporal information are used in message passing to learn node and edge embeddings and to inform the imputation task. Our proposed method, temporal setting imputation using graph neural networks (TSI-GNN), captures sequence information that can then be used within an aggregation function of a graph neural network. To the best of our knowledge, this is the first effort to use a joint bipartite graph approach that captures sequence information to handle missing data. We use several benchmark datasets to test the performance of our method against a variety of conditions, comparing to both classic and contemporary methods. We further provide insight to manage the size of the generated TSI-GNN model. Through our analysis we show that incorporating temporal information into a bipartite graph improves the representation at the 30% and 60% missing rate, specifically when using a nonlinear model for downstream prediction tasks in regularly sampled datasets and is competitive with existing temporal methods under different scenarios.


2020 ◽  
Author(s):  
Angelyn Lao ◽  
Heriberto Cabezas ◽  
Ákos Orosz ◽  
Ferenc Friedler ◽  
Raymond Tan

We propose a process graph (P-graph) approach to develop ecosystem networks from knowledge of the properties of the component species. Originally developed as a process engineering tool for designing industrial plants, the P-graph framework has key advantages over conventional ecological network analysis (ENA) techniques. A P-graph is a bipartite graph consisting of two types of nodes, which we propose to represent components of an ecosystem. Compartments within ecosystems (e.g., organism species) are represented by one class of nodes, while the roles or functions that they play relative to other compartments are represented by a second class of nodes. This bipartite graph representation enables a powerful, unambiguous representation of relationships among ecosystem compartments, which can come in tangible (e.g., mass flow in predation) or intangible form (e.g., symbiosis). For example, within a P-graph, the distinct roles of bees as pollinators for some plants and as prey for some animals can be explicitly represented, which would not otherwise be possible using conventional ENA. After a discussion of the mapping of ecosystems into P-graph, we also discuss how this framework can be used to guide understanding of complex networks that exist in nature. Two component algorithms of P-graph, namely maximal structure generation (MSG) and solution structure generation (SSG), are shown to be particularly useful for ENA. This method can be used to determine the (a) effects of loss of specific ecosystem compartments due to extinction, (b) potential efficacy of ecosystem reconstruction efforts, and (c) maximum sustainable exploitation of human ecosystem services by humans. We illustrate the use of P-graph for the analysis of ecosystem compartment loss using a small-scale stylized case study, and further propose a new criticality index that can be easily derived from SSG results.


2018 ◽  
Vol 25 (10) ◽  
pp. 1311-1321 ◽  
Author(s):  
Behrooz Davazdahemami ◽  
Dursun Delen

Abstract Objectives This study extends prior research by combining a chronological pharmacovigilance network approach with machine-learning (ML) techniques to predict adverse drug events (ADEs) based on the drugs’ similarities in terms of the proteins they target in the human body. The focus of this research, though, is particularly centered on predicting the drug-ADE associations for a set of 8 common and high-risk ADEs. Materials and methods large collection of annotated MEDLINE biomedical articles was used to construct a drug-ADE network, and the network was further equipped with information about drugs’ target proteins. Several network metrics were extracted and used as predictors in ML algorithms to predict the existence of network edges (ie, associations or relationships). Results Gradient boosted trees (GBTs) as an ensemble ML algorithm outperformed other prediction methods in identifying the drug-ADE associations with an overall accuracy of 92.8% on the validation sample. The prediction model was able to predict drug-ADE associations, on average, 3.84 years earlier than they were actually mentioned in the biomedical literature. Conclusion While network analysis and ML techniques were used in separation in prior ADE studies, our results showed that they, in combination with each other, can boost the power of one another and predict better. Moreover, our results highlight the superior capability of ensemble-type ML methods in capturing drug-ADE patterns compared to the regular (ie, singular), ML algorithms.


Sign in / Sign up

Export Citation Format

Share Document