scholarly journals RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine

2021 ◽  
Author(s):  
E. C. Wood ◽  
Amy K. Glen ◽  
Lindsey G. Kvarfordt ◽  
Finn Womack ◽  
Liliana Acevedo ◽  
...  

Background: Biomedical translational science is increasingly leveraging computational reasoning on large repositories of structured knowledge (such as the Unified Medical Language System (UMLS), the Semantic Medline Database (SemMedDB), ChEMBL, DrugBank, and the Small Molecule Pathway Database (SMPDB)) and data in order to facilitate discovery of new therapeutic targets and modalities. Since 2016, the NCATS Biomedical Data Translator project has been working to federate autonomous reasoning agents and knowledge providers within a distributed system for answering translational questions. Within that project and within the field more broadly, there is an urgent need for an open-source framework that can efficiently and reproducibly build an integrated, standards-compliant, and comprehensive biomedical knowledge graph that can be either downloaded in standard serialized form or queried via a public application programming interface (API) that accords with the FAIR data principles. Results: To create a knowledge provider system within the Translator project, we have developed RTX-KG2, an open-source software system for building—and hosting a web API for querying—a biomedical knowledge graph that uses an Extract-Transform-Load (ETL) approach to integrate 70 knowledge sources (including the aforementioned sources) into a single knowledge graph. The semantic layer and schema for RTX-KG2 follow the standard Biolink metamodel to maximize interoperability within Translator. RTX-KG2 is currently being used by multiple Translator reasoning agents, both in its downloadable form and via its SmartAPI-registered web interface. JavaScript Object Notation (JSON) serializations of RTX-KG2 are available for download of RTX-KG2 in both the pre-canonicalized form and in canonicalized form (in which synonym concepts are merged). The current canonicalized version (KG2.7.3) of RTX-KG2 contains 6.4M concept nodes and 39.3M relationship edges with a rich set of 77 relationship types. Conclusion: RTX-KG2 is the first open-source knowledge graph of which we are aware that integrates UMLS, SemMedDB, ChEMBL, DrugBank, SMPDB, and 65 additional knowledge sources within a knowledge graph that conforms to the Biolink standard for its semantic layer and schema at the intersections of these databases. RTX-KG2 is publicly available for querying via its (API) at arax.ncats.io/api/rtxkg2/v1.2/openapi.json. The code to build RTX-KG2 is publicly available at github:RTXteam/RTX-KG2.

2021 ◽  
Author(s):  
Florin Ratajczak ◽  
Mitchell Joblin ◽  
Martin Ringsquandl ◽  
Marcel Hildebrandt

Abstract Background Drug repurposing aims at finding new targets for already developed drugs. It becomes more relevant as the cost of discovering new drugs steadily increases. To find new potential targets for a drug, an abundance of methods and existing biomedical knowledge from different domains can be leveraged. Recently, knowledge graphs have emerged in the biomedical domain that integrate information about genes, drugs, diseases and other biological domains. Knowledge graphs can be used to predict new connections between compounds and diseases, leveraging the interconnected biomedical data around them. While real world use cases such as drug repurposing are only interested in one specific relation type, widely used knowledge graph embedding models simultaneously optimize over all relation types in the graph. This can lead the models to underfit the data that is most relevant for the desired relation type. We propose a method that leverages domain knowledge in the form of metapaths and use them to filter two biomedical knowledge graphs (Hetionet and DRKG) for the purpose of improving performance on the prediction task of drug repurposing while simultaneously increasing computational efficiency. Results We find that our method reduces the number of entities by 60% on Hetionet and 26% on DRKG, while leading to an improvement in prediction performance of up to 40.8% on Hetionet and 12.4% on DRKG, with an average improvement of 17.5% on Hetionet and 5.1% on DRKG. Additionally, prioritization of antiviral compounds for SARS CoV-2 improves after task-driven filtering is applied. Conclusion Knowledge graphs contain facts that are counter productive for specific tasks, in our case drug repurposing. We also demonstrate that these facts can be removed, resulting in an improved performance in that task and a more efficient learning process.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Andra Waagmeester ◽  
Gregory Stupp ◽  
Sebastian Burgstaller-Muehlbacher ◽  
Benjamin M Good ◽  
Malachi Griffith ◽  
...  

Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 998
Author(s):  
Peng Zhang ◽  
Yi Bu ◽  
Peng Jiang ◽  
Xiaowen Shi ◽  
Bing Lun ◽  
...  

This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation.


2021 ◽  
pp. 1-21
Author(s):  
Wenguang Wang ◽  
Yonglin Xu ◽  
Chunhui Du ◽  
Yunwen Chen ◽  
Yijie Wang ◽  
...  

Abstract With the development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, knowledge graph technology has been in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.


Author(s):  
Kris Ven ◽  
Jan Verelst

Previous research suggests that the adoption of open source server software (OSSS) may be subject to knowledge barriers. In order to overcome these barriers, organizations should engage in a process of organizational learning. This learning process is facilitated by exposure to external knowledge sources. Unfortunately, this leaves open the question of which factors determine which knowledge sources are used by organizations. In this study, the authors have performed an exploratory study on the determinants of the use of knowledge sources in the adoption of OSSS. The conceptual model developed in this study was based on the absorptive capacity theory. Data was gathered from 95 organizations to empirically investigate this model. Results provide a quite consistent view on how external knowledge sources are used by organizations in the adoption of OSSS. Moreover, results provide more insight into the context in which the adoption of OSSS takes place.


2021 ◽  
pp. 584-595
Author(s):  
Joana Vilela ◽  
Muhammad Asif ◽  
Ana Rita Marques ◽  
João Xavier Santos ◽  
Célia Rasga ◽  
...  

2011 ◽  
Vol 20 (01) ◽  
pp. 30-32
Author(s):  
P. Ruch ◽  

SummaryTo summarize current advances of the so-called Web 3.0 and emerging trends of the semantic web.We provide a synopsis of the articles selected for the IMIA Yearbook 2011, from which we attempt to derive a synthetic overview of the today’s and future activities in the field.while the state of the research in the field is illustrated by a set of fairly heterogeneous studies, it is possible to identify significant clusters. While the most salient challenge and obsessional target of the semantic web remains its ambition to simply interconnect all available information, it is interesting to observe the developments of complementary research fields such as information sciences and text analytics. The combined expression power and virtually unlimited data aggregation skills of Web 3.0 technologies make it a disruptive instrument to discover new biomedical knowledge. In parallel, such an unprecedented situation creates new threats for patients participating in large-scale genetic studies as Wjst demonstrate how various data set can be coupled to re-identify anonymous genetic information.The best paper selection of articles on decision support shows examples of excellent research on methods concerning original development of core semantic web techniques as well as transdisciplinary achievements as exemplified with literature-based analytics. This selected set of scientific investigations also demonstrates the needs for computerized applications to transform the biomedical data overflow into more operational clinical knowledge with potential threats for confidentiality directly associated with such advances. Altogether these papers support the idea that more elaborated computer tools, likely to combine heterogeneous text and data contents should soon emerge for the benefit of both experimentalists and hopefully clinicians.


2020 ◽  
Vol 27 (10) ◽  
pp. 1606-1611
Author(s):  
Liz Amos ◽  
David Anderson ◽  
Stacy Brody ◽  
Anna Ripple ◽  
Betsy L Humphreys

Abstract The US National Library of Medicine regularly collects summary data on direct use of Unified Medical Language System (UMLS) resources. The summary data sources include UMLS user registration data, required annual reports submitted by registered users, and statistics on downloads and application programming interface calls. In 2019, the National Library of Medicine analyzed the summary data on 2018 UMLS use. The library also conducted a scoping review of the literature to provide additional intelligence about the research uses of UMLS as input to a planned 2020 review of UMLS production methods and priorities. 5043 direct users of UMLS data and tools downloaded 4402 copies of the UMLS resources and issued 66 130 951 UMLS application programming interface requests in 2018. The annual reports and the scoping review results agree that the primary UMLS uses are to process and interpret text and facilitate mapping or linking between terminologies. These uses align with the original stated purpose of the UMLS.


Sign in / Sign up

Export Citation Format

Share Document