Disease ontologies for knowledge graphs

Abstract Background Data integration to build a biomedical knowledge graph is a challenging task. There are multiple disease ontologies used in data sources and publications, each having its hierarchy. A common task is to map between ontologies, find disease clusters and finally build a representation of the chosen disease area. There is a shortage of published resources and tools to facilitate interactive, efficient and flexible cross-referencing and analysis of multiple disease ontologies commonly found in data sources and research. Results Our results are represented as a knowledge graph solution that uses disease ontology cross-references and facilitates switching between ontology hierarchies for data integration and other tasks. Conclusions Grakn core with pre-installed “Disease ontologies for knowledge graphs” facilitates the biomedical knowledge graph build and provides an elegant solution for the multiple disease ontologies problem.

Download Full-text

Virtual Knowledge Graphs: An Overview of Systems and Use Cases

Data Intelligence ◽

10.1162/dint_a_00011 ◽

2019 ◽

Vol 1 (3) ◽

pp. 201-223 ◽

Cited By ~ 17

Author(s):

Guohui Xiao ◽

Linfang Ding ◽

Benjamin Cogrel ◽

Diego Calvanese

Keyword(s):

Data Integration ◽

Domain Knowledge ◽

Data Access ◽

Use Cases ◽

Future Research ◽

Knowledge Graph ◽

Research Directions ◽

Wide Range ◽

Future Research Directions ◽

Knowledge Graphs

In this paper, we present the virtual knowledge graph (VKG) paradigm for data integration and access, also known in the literature as Ontology-based Data Access. Instead of structuring the integration layer as a collection of relational tables, the VKG paradigm replaces the rigid structure of tables with the flexibility of graphs that are kept virtual and embed domain knowledge. We explain the main notions of this paradigm, its tooling ecosystem and significant use cases in a wide range of applications. Finally, we discuss future research directions.

Download Full-text

Poster Paper Data Integration for Supporting Biomedical Knowledge Graph Creation at Large-Scale

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/978-3-030-06016-9_9 ◽

2018 ◽

pp. 91-96

Author(s):

Samaneh Jozashoori ◽

Tatiana Novikova ◽

Maria-Esther Vidal

Keyword(s):

Data Integration ◽

Large Scale ◽

Knowledge Graph ◽

Biomedical Knowledge ◽

Poster Paper

Download Full-text

Task-Driven Knowledge Graph Filtering Improves Prioritizing Drugs for Repurposing

10.21203/rs.3.rs-721705/v1 ◽

2021 ◽

Author(s):

Florin Ratajczak ◽

Mitchell Joblin ◽

Martin Ringsquandl ◽

Marcel Hildebrandt

Keyword(s):

Domain Knowledge ◽

Drug Repurposing ◽

New Drugs ◽

Knowledge Graph ◽

Biomedical Data ◽

Biomedical Knowledge ◽

Relation Type ◽

Knowledge Graphs ◽

Improved Performance ◽

Efficient Learning

Abstract Background Drug repurposing aims at finding new targets for already developed drugs. It becomes more relevant as the cost of discovering new drugs steadily increases. To find new potential targets for a drug, an abundance of methods and existing biomedical knowledge from different domains can be leveraged. Recently, knowledge graphs have emerged in the biomedical domain that integrate information about genes, drugs, diseases and other biological domains. Knowledge graphs can be used to predict new connections between compounds and diseases, leveraging the interconnected biomedical data around them. While real world use cases such as drug repurposing are only interested in one specific relation type, widely used knowledge graph embedding models simultaneously optimize over all relation types in the graph. This can lead the models to underfit the data that is most relevant for the desired relation type. We propose a method that leverages domain knowledge in the form of metapaths and use them to filter two biomedical knowledge graphs (Hetionet and DRKG) for the purpose of improving performance on the prediction task of drug repurposing while simultaneously increasing computational efficiency. Results We find that our method reduces the number of entities by 60% on Hetionet and 26% on DRKG, while leading to an improvement in prediction performance of up to 40.8% on Hetionet and 12.4% on DRKG, with an average improvement of 17.5% on Hetionet and 5.1% on DRKG. Additionally, prioritization of antiviral compounds for SARS CoV-2 improves after task-driven filtering is applied. Conclusion Knowledge graphs contain facts that are counter productive for specific tasks, in our case drug repurposing. We also demonstrate that these facts can be removed, resulting in an improved performance in that task and a more efficient learning process.

Download Full-text

Expanding a Database-derived Biomedical Knowledge Graph via Multi-relation Extraction from Biomedical Abstracts

10.1101/730085 ◽

2019 ◽

Author(s):

David N. Nicholson ◽

Daniel S. Himmelstein ◽

Casey S. Greene

Keyword(s):

Contextual Information ◽

Relation Extraction ◽

Publication Rate ◽

Knowledge Graph ◽

Biomedical Knowledge ◽

Text Annotation ◽

Label Function ◽

Manual Curation ◽

Function Combination ◽

Knowledge Graphs

AbstractKnowledge graphs support multiple research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via some form of manual curation, which is difficult to scale in the context of an increasing publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to automatically annotate textual data. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This makes populating a knowledge graph with multiple nodes and edge types practically infeasible. We sought to accelerate the label function creation process by evaluating the extent to which label functions could be re-used across multiple edge types. We used a subset of an existing knowledge graph centered on disease, compound, and gene entities to evaluate label function re-use. We determined the best label function combination by comparing a baseline database-only model with the same model but added edge-specific or edge-mismatch label functions. We confirmed that adding additional edge-specific rather than edge-mismatch label functions often improves text annotation and shows that this approach can incorporate novel edges into our source knowledge graph. We expect that continued development of this strategy has the potential to swiftly populate knowledge graphs with new discoveries, ensuring that these resources include cutting-edge results.

Download Full-text

Property-Based Semantic Similarity Criteria to Evaluate the Overlaps of Schemas

Algorithms ◽

10.3390/a14080241 ◽

2021 ◽

Vol 14 (8) ◽

pp. 241

Author(s):

Lan Huang ◽

Yuanwei Zhao ◽

Bo Wang ◽

Dongxu Zhang ◽

Rui Zhang ◽

...

Keyword(s):

Data Integration ◽

Semantic Similarity ◽

Domain Knowledge ◽

Similarity Criteria ◽

Knowledge Graph ◽

Legacy Systems ◽

High Quality ◽

Cross Domain ◽

Knowledge Graphs

Knowledge graph-based data integration is a practical methodology for heterogeneous legacy database-integrated service construction. However, it is neither efficient nor economical to build a new cross-domain knowledge graph on top of the schemas of each legacy database for the specific integration application rather than reusing the existing high-quality knowledge graphs. Consequently, a question arises as to whether the existing knowledge graph is compatible with cross-domain queries and with heterogenous schemas of the legacy systems. An effective criterion is urgently needed in order to evaluate such compatibility as it limits the quality upbound of the integration. This research studies the semantic similarity of the schemas from the aspect of properties. It provides a set of in-depth criteria, namely coverage and flexibility, to evaluate the pairwise compatibility between the schemas. It takes advantage of the properties of knowledge graphs to evaluate the overlaps between schemas and defines the weights of entity types in order to perform precise compatibility computation. The effectiveness of the criteria obtained to evaluate the compatibility between knowledge graphs and cross-domain queries is demonstrated using a case study.

Download Full-text

Mobile Software Assurance Informed through Knowledge Graph Construction: The OWASP Threat of Insecure Data Storage

Journal of Computer Science Research ◽

10.30564/jcsr.v2i2.1765 ◽

2020 ◽

Vol 2 (2) ◽

Author(s):

Suzanna Schmeelk ◽

Lixin Tao

Keyword(s):

Data Storage ◽

Program Analysis ◽

Web Application ◽

Security Analysis ◽

Knowledge Graph ◽

Healthcare Applications ◽

Sensitive Data ◽

Knowledge Graphs ◽

Mobile Malware Detection ◽

Software Assurance

Many organizations, to save costs, are movinheg to t Bring Your Own Mobile Device (BYOD) model and adopting applications built by third-parties at an unprecedented rate. Our research examines software assurance methodologies specifically focusing on security analysis coverage of the program analysis for mobile malware detection, mitigation, and prevention. This research focuses on secure software development of Android applications by developing knowledge graphs for threats reported by the Open Web Application Security Project (OWASP). OWASP maintains lists of the top ten security threats to web and mobile applications. We develop knowledge graphs based on the two most recent top ten threat years and show how the knowledge graph relationships can be discovered in mobile application source code. We analyze 200+ healthcare applications from GitHub to gain an understanding of their software assurance of their developed software for one of the OWASP top ten moble threats, the threat of “Insecure Data Storage.” We find that many of the applications are storing personally identifying information (PII) in potentially vulnerable places leaving users exposed to higher risks for the loss of their sensitive data.

Download Full-text

Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources

Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence - CSAI '18 ◽

10.1145/3297156.3297249 ◽

2018 ◽

Author(s):

Alexey Samoylov ◽

Nikolay Sergeev ◽

Margarita Kucherova ◽

Boris Denisov

Keyword(s):

Big Data ◽

Data Integration ◽

A Priori ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Content‐based and knowledge graph‐based paper recommendation: Exploring user preferences with the knowledge graphs for scientific paper recommendation

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6227 ◽

2021 ◽

Author(s):

Hao Tang ◽

Baisong Liu ◽

Jiangbo Qian

Keyword(s):

Scientific Paper ◽

User Preferences ◽

Knowledge Graph ◽

Knowledge Graphs

Download Full-text

Development of Knowledge Graph for Data Management Related to Flooding Disasters Using Open Data

Future Internet ◽

10.3390/fi13050124 ◽

2021 ◽

Vol 13 (5) ◽

pp. 124

Author(s):

Jiseong Son ◽

Chul-Su Lim ◽

Hyoung-Seop Shim ◽

Ji-Sun Kang

Keyword(s):

Artificial Intelligence ◽

Domain Knowledge ◽

Open Data ◽

Heterogeneous Data ◽

Big Data Analysis ◽

Knowledge Graph ◽

Cross Domain ◽

Disaster Data ◽

Knowledge Graphs ◽

Open Datasets

Despite the development of various technologies and systems using artificial intelligence (AI) to solve problems related to disasters, difficult challenges are still being encountered. Data are the foundation to solving diverse disaster problems using AI, big data analysis, and so on. Therefore, we must focus on these various data. Disaster data depend on the domain by disaster type and include heterogeneous data and lack interoperability. In particular, in the case of open data related to disasters, there are several issues, where the source and format of data are different because various data are collected by different organizations. Moreover, the vocabularies used for each domain are inconsistent. This study proposes a knowledge graph to resolve the heterogeneity among various disaster data and provide interoperability among domains. Among disaster domains, we describe the knowledge graph for flooding disasters using Korean open datasets and cross-domain knowledge graphs. Furthermore, the proposed knowledge graph is used to assist, solve, and manage disaster problems.

Download Full-text