BioKEEN: a library for learning and evaluating biological knowledge graph embeddings

Mehdi Ali; Charles Tapley Hoyt; Daniel Domingo-Fernández; Jens Lehmann; Hajira Jabeen

doi:10.1093/bioinformatics/btz117

BioKEEN: a library for learning and evaluating biological knowledge graph embeddings

Bioinformatics ◽

10.1093/bioinformatics/btz117 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3538-3540 ◽

Cited By ~ 8

Author(s):

Mehdi Ali ◽

Charles Tapley Hoyt ◽

Daniel Domingo-Fernández ◽

Jens Lehmann ◽

Hajira Jabeen

Keyword(s):

Supplementary Information ◽

Knowledge Graph ◽

Biological Knowledge ◽

Command Line ◽

Graph Embeddings ◽

Command Line Interface ◽

Software Ecosystem ◽

Mapping Resource ◽

Significant Attention

Abstract Summary Knowledge graph embeddings (KGEs) have received significant attention in other domains due to their ability to predict links and create dense representations for graphs’ nodes and edges. However, the software ecosystem for their application to bioinformatics remains limited and inaccessible for users without expertise in programing and machine learning. Therefore, we developed BioKEEN (Biological KnowlEdge EmbeddiNgs) and PyKEEN (Python KnowlEdge EmbeddiNgs) to facilitate their easy use through an interactive command line interface. Finally, we present a case study in which we used a novel biological pathway mapping resource to predict links that represent pathway crosstalks and hierarchies. Availability and implementation BioKEEN and PyKEEN are open source Python packages publicly available under the MIT License at https://github.com/SmartDataAnalytics/BioKEEN and https://github.com/SmartDataAnalytics/PyKEEN Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BioKEEN: A library for learning and evaluating biological knowledge graph embeddings

10.1101/475202 ◽

2018 ◽

Cited By ~ 3

Author(s):

Mehdi Ali ◽

Charles Tapley Hoyt ◽

Daniel Domingo-Fernández ◽

Jens Lehmann ◽

Hajira Jabeen

Keyword(s):

Knowledge Graph ◽

Biological Knowledge ◽

Command Line ◽

Graph Embeddings ◽

Command Line Interface ◽

Software Ecosystem ◽

Link Type ◽

Mapping Resource ◽

Significant Attention

AbstractKnowledge graph embeddings (KGEs) have received significant attention in other domains due to their ability to predict links and create dense representations for graphs’ nodes and edges. However, the software ecosystem for their application to bioinformatics remains limited and inaccessible for users without expertise in programming and machine learning. Therefore, we developed BioKEEN (Biological KnowlEdge EmbeddiNgs) and PyKEEN (Python KnowlEdge EmbeddiNgs) to facilitate their easy use through an interactive command line interface. Finally, we present a case study in which we used a novel biological pathway mapping resource to predict links that represent pathway crosstalks and hierarchies.AvailabilityBioKEEN and PyKEEN are open source Python packages publicly available under the MIT License at https://github.com/SmartDataAnalytics/BioKEEN and https://github.com/SmartDataAnalytics/PyKEEN as well as through PyPI.

Download Full-text

DamageProfiler: Fast damage pattern calculation for ancient DNA

Bioinformatics ◽

10.1093/bioinformatics/btab190 ◽

2021 ◽

Author(s):

Judith Neukamm ◽

Alexander Peltzer ◽

Kay Nieselt

Keyword(s):

Ancient Dna ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Central Importance ◽

Command Line Interface ◽

Analysis Pipeline ◽

File Formats ◽

Programming Knowledge ◽

User Friendly

Abstract Motivation In ancient DNA research, the authentication of ancient samples based on specific features remains a crucial step in data analysis. Because of this central importance, researchers lacking deeper programming knowledge should be able to run a basic damage authentication analysis. Such software should be user-friendly and easy to integrate into an analysis pipeline. Results DamageProfiler is a Java based, stand-alone software to determine damage patterns in ancient DNA. The results are provided in various file formats and plots for further processing. DamageProfiler has an intuitive graphical as well as command line interface that allows the tool to be easily embedded into an analysis pipeline. Availability All of the source code is freely available on GitHub (https://github.com/Integrative-Transcriptomics/DamageProfiler). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CoV-Seq: SARS-CoV-2 Genome Analysis and Visualization

10.1101/2020.05.01.071050 ◽

2020 ◽

Cited By ~ 3

Author(s):

Boxiang Liu ◽

Kaibo Liu ◽

He Zhang ◽

Liang Zhang ◽

Yuchen Bian ◽

...

Keyword(s):

Ad Hoc ◽

Rapid Analysis ◽

Supplementary Information ◽

Command Line ◽

Command Line Interface ◽

Link Type ◽

Global Pandemic ◽

Fast Pace ◽

Public Repositories ◽

Programming Knowledge

AbstractSummaryCOVID-19 has become a global pandemic not long after its inception in late 2019. SARS-CoV-2 genomes are being sequenced and shared on public repositories at a fast pace. To keep up with these updates, scientists need to frequently refresh and reclean datasets, which is ad hoc and labor-intensive. Further, scientists with limited bioinformatics or programming knowledge may find it difficult to analyze SARS-CoV-2 genomes. In order to address these challenges, we developed CoV-Seq, a webserver to enable simple and rapid analysis of SARS-CoV-2 genomes. Given a new sequence, CoV-Seq automatically predicts gene boundaries and identifies genetic variants, which are presented in an interactive genome visualizer and are downloadable for further analysis. A command-line interface is also available for high-throughput processing.Availability and ImplementationCoV-Seq is implemented in Python and Javascript. The webserver is available at http://covseq.baidu.com/ and the source code is available from https://github.com/boxiangliu/[email protected] informationSupplementary information are available at bioRxiv online.

Download Full-text

OLOGRAM: determining significance of total overlap length between genomic regions sets

Bioinformatics ◽

10.1093/bioinformatics/btz810 ◽

2019 ◽

Cited By ~ 3

Author(s):

Q Ferré ◽

G Charbonnier ◽

N Sadouni ◽

F Lopez ◽

Y Kermezli ◽

...

Keyword(s):

Negative Binomial ◽

Statistical Significance ◽

Functional Relation ◽

Supplementary Information ◽

P Value ◽

Binomial Model ◽

Command Line ◽

Command Line Interface ◽

Bioinformatics Analyses ◽

Genomic Regions

Abstract Motivation Various bioinformatics analyses provide sets of genomic coordinates of interest. Whether two such sets possess a functional relation is a frequent question. This is often determined by interpreting the statistical significance of their overlaps. However, only few existing methods consider the lengths of the overlap, and they do not provide a resolutive P-value. Results Here, we introduce OLOGRAM, which performs overlap statistics between sets of genomic regions described in BEDs or GTF. It uses Monte Carlo simulation, taking into account both the distributions of region and inter-region lengths, to fit a negative binomial model of the total overlap length. Exclusion of user-defined genomic areas during the shuffling is supported. Availability and implementation This tool is available through the command line interface of the pygtftk toolkit. It has been tested on Linux and OSX and is available on Bioconda and from https://github.com/dputhier/pygtftk under the GNU GPL license. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GfaViz: flexible and interactive visualization of GFA sequence graphs

Bioinformatics ◽

10.1093/bioinformatics/bty1046 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2853-2855 ◽

Cited By ~ 2

Author(s):

Giorgio Gonnella ◽

Niklas Niehus ◽

Stefan Kurtz

Keyword(s):

Interactive Visualization ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Command Line Interface ◽

Vector Graphics ◽

Fragment Assembly ◽

Or Groups ◽

Graphical Tool ◽

Standard Configuration

Abstract Summary The graphical fragment assembly (GFA) formats are emerging standard formats for the representation of sequence graphs. Although GFA 1 was primarily targeting assembly graphs, the newer GFA 2 format introduces several features, which makes it suitable for representing other kinds of information, such as scaffolding graphs, variation graphs, alignment graphs and colored metagenomic graphs. Here, we present GfaViz, an interactive graphical tool for the visualization of sequence graphs in GFA format. The software supports all new features of GFA 2 and introduces conventions for their visualization. The user can choose between two different layouts and multiple styles for representing single elements or groups. All customizations can be stored in custom tags of the GFA format itself, without requiring external configuration files. Stylesheets are supported for storing standard configuration options for groups of files. The visualizations can be exported to raster and vector graphics formats. A command line interface allows for batch generation of images. Availability and implementation GfaViz is available at https://github.com/ggonnella/gfaviz Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Incorporating Attributes Semantics into Knowledge Graph Embeddings

2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD) ◽

10.1109/cscwd49262.2021.9437876 ◽

2021 ◽

Author(s):

Mingyang Li ◽

Neng Gao ◽

Chenyang Tu ◽

Jia Peng ◽

Min Li

Keyword(s):

Knowledge Graph ◽

Graph Embeddings

Download Full-text

Dementia key gene identification with multi-layered SNP-gene-disease network

Bioinformatics ◽

10.1093/bioinformatics/btaa814 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i831-i839

Author(s):

Dong-gi Lee ◽

Myungjun Kim ◽

Sang Joon Son ◽

Chang Hyung Hong ◽

Hyunjung Shin

Keyword(s):

Candidate Genes ◽

Learning Algorithm ◽

Search Space ◽

Supplementary Information ◽

Gene Identification ◽

Nucleotide Polymorphisms ◽

Disease Network ◽

Single Nucleotide ◽

Key Genes ◽

Significant Attention

Abstract Motivation Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and effort. To alleviate these problems, research on utilizing computational biology to decrease the search space of candidate genes is actively conducted. In this study, we propose a framework in which diseases, genes and single-nucleotide polymorphisms are represented by a layered network, and key genes are predicted by a machine learning algorithm. The algorithm utilizes a network-based semi-supervised learning model that can be applied to layered data structures. Results The proposed method was applied to a dataset extracted from public databases related to diseases and genes with data collected from 186 patients. A portion of key genes obtained using the proposed method was verified in silico through PubMed literature, and the remaining genes were left as possible candidate genes. Availability and implementation The code for the framework will be available at http://www.alphaminers.net/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text