Strategies for Assembling the Biodiversity Knowledge Graph

Biodiversity Information Science and Standards ◽

10.3897/biss.4.59126 ◽

2020 ◽

Vol 4 ◽

Author(s):

Roderic Page

Keyword(s):

Life Sciences ◽

Biomedical Literature ◽

Knowledge Graph ◽

Supporting Evidence ◽

Domain Specific ◽

Biodiversity Knowledge ◽

Comprehensive Knowledge ◽

International Image ◽

Knowledge Graphs ◽

Semantic Publishing

This talk explores different strategies for assembling the “biodiversity knowledge graph” (Page 2016). The first is a centralised, crowd-sourced approach using Wikidata as the foundation. Wikidata is becoming increasingly attractive as a knowledge graph for the life sciences (Waagmeester et al. 2020), and I will discuss some of its strengths and limitations, particularly as a source of bibliographic and taxonomic information. For example, Wikidata’s handling of taxonomy is somewhat problematic given the lack of clear separation of taxa and their names. A second approach is to build biodiversity knowledge graphs from scratch, such as OpenBioDiv (Penev et al. 2019) and my own Ozymandias (Page 2019). These approaches use either generalised vocabularies such as schema.org, or domain specific ones such as TaxPub (Catapano 2010) and the Semantic Publishing and Referencing Ontologies (SPAR) (Peroni and Shotton 2018), and to date tend to have restricted focus, whether geographic (e.g., Australian animals in Ozymandias) or temporal (recent taxonomic literature, OpenBioDiv). A growing number of data sources are now using schema.org to describe their data, including ORCID and Zenodo, and efforts to extend schema.org into biology (Bioschemas) suggest we may soon be able to build comprehensive knowledge graphs using just schema.org and its derivatives. A third approach is not to build an entire knowledge graph, but instead focus on constructing small pieces of the graph tightly linked to supporting evidence, for example via annotations. Annotations are increasingly used to mark up both the biomedical literature (e.g., Kim et al. 2015, Venkatesan et al. 2017) and the biodiversity literature (Batista-Navarro et al. 2017). One could argue that taxonomic databases are essentially lists of annotations (“this name appears in this publication on this page”), which suggests we could link literature projects such as the Biodiversity Heritage Library (BHL) to taxonomic databases via annotations. Given that the International Image Interoperability Framework (IIIF) provides a framework for treating publications themselves as a set of annotations (e.g., page images) upon which other annotations can be added (Zundert 2018), this suggests ways that knowledge graphs could lead directly to visualising the links between taxonomy and the taxonomic literature. All three approaches will be discussed, accompanied by working examples.

Download Full-text

Pattern-based design applied to cultural heritage knowledge graphs

Semantic Web ◽

10.3233/sw-200422 ◽

2020 ◽

pp. 1-45

Author(s):

Valentina Anita Carriero ◽

Aldo Gangemi ◽

Maria Letizia Mancinelli ◽

Andrea Giovanni Nuzzolese ◽

Valentina Presutti ◽

...

Keyword(s):

Cultural Heritage ◽

Design Patterns ◽

Lessons Learned ◽

Knowledge Graph ◽

Unit Testing ◽

Domain Specific ◽

Rigorous Testing ◽

Ontology Design ◽

Domain Specific Knowledge ◽

Knowledge Graphs

Ontology Design Patterns (ODPs) have become an established and recognised practice for guaranteeing good quality ontology engineering. There are several ODP repositories where ODPs are shared as well as ontology design methodologies recommending their reuse. Performing rigorous testing is recommended as well for supporting ontology maintenance and validating the resulting resource against its motivating requirements. Nevertheless, it is less than straightforward to find guidelines on how to apply such methodologies for developing domain-specific knowledge graphs. ArCo is the knowledge graph of Italian Cultural Heritage and has been developed by using eXtreme Design (XD), an ODP- and test-driven methodology. During its development, XD has been adapted to the need of the CH domain e.g. gathering requirements from an open, diverse community of consumers, a new ODP has been defined and many have been specialised to address specific CH requirements. This paper presents ArCo and describes how to apply XD to the development and validation of a CH knowledge graph, also detailing the (intellectual) process implemented for matching the encountered modelling problems to ODPs. Relevant contributions also include a novel web tool for supporting unit-testing of knowledge graphs, a rigorous evaluation of ArCo, and a discussion of methodological lessons learned during ArCo’s development.

Download Full-text

Re-curation and Rational Enrichment of Knowledge Graphs in Biological Expression Language

10.1101/536409 ◽

2019 ◽

Author(s):

Charles Tapley Hoyt ◽

Daniel Domingo-Fernández ◽

Rana Aldisi ◽

Lingling Xu ◽

Kristian Kolpeja ◽

...

Keyword(s):

Text Mining ◽

Full Text ◽

Biomedical Literature ◽

Knowledge Graph ◽

Pubmed Central ◽

Link Type ◽

Information Density ◽

Manual Curation ◽

Rapid Accumulation ◽

Knowledge Graphs

AbstractThe rapid accumulation of new biomedical literature not only causes curated knowledge graphs to become outdated and incomplete, but also makes manual curation an impractical and unsustainable solution. Automated or semi-automated workflows are necessary to assist in prioritizing and curating the literature to update and enrich knowledge graphs.We have developed two workflows: one for re-curating a given knowledge graph to assure its syntactic and semantic quality and another for rationally enriching it by manually revising automatically extracted relations for nodes with low information density. We applied these workflows to the knowledge graphs encoded in Biological Expression Language from the NeuroMMSig database using content that was pre-extracted from MEDLINE abstracts and PubMed Central full text articles using text mining output integrated by INDRA. We have made this workflow freely available at https://github.com/bel-enrichment/bel-enrichment.Database URLhttps://github.com/bel-enrichment/results

Download Full-text

Training and hackathon on building biodiversity knowledge graphs

Research Ideas and Outcomes ◽

10.3897/rio.5.e36152 ◽

2019 ◽

Vol 5 ◽

Cited By ~ 1

Author(s):

Joel Sachs ◽

Roderic Page ◽

Steven J Baskauf ◽

Jocelyn Pender ◽

Beatriz Lujan-Toro ◽

...

Keyword(s):

Knowledge Graph ◽

Graph Database ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Specific Knowledge ◽

Biodiversity Knowledge ◽

Training Event ◽

Web Infrastructure ◽

Ongoing Development ◽

Knowledge Graphs

Knowledge graphs have the potential to unite disconnected digitized biodiversity data, and there are a number of efforts underway to build biodiversity knowledge graphs. More generally, the recent popularity of knowledge graphs, driven in part by the advent and success of the Google Knowledge Graph, has breathed life into the ongoing development of semantic web infrastructure and prototypes in the biodiversity informatics community. We describe a one week training event and hackathon that focused on applying three specific knowledge graph technologies – the Neptune graph database; Metaphactory; and Wikidata - to a diverse set of biodiversity use cases.We give an overview of the training, the projects that were advanced throughout the week, and the critical discussions that emerged. We believe that the main barriers towards adoption of biodiversity knowledge graphs are the lack of understanding of knowledge graphs and the lack of adoption of shared unique identifiers. Furthermore, we believe an important advancement in the outlook of knowledge graph development is the emergence of Wikidata as an identifier broker and as a scoping tool. To remedy the current barriers towards biodiversity knowledge graph development, we recommend continued discussions at workshops and at conferences, which we expect to increase awareness and adoption of knowledge graph technologies.

Download Full-text

Wikidata and the biodiversity knowledge graph

Biodiversity Information Science and Standards ◽

10.3897/biss.3.34742 ◽

2019 ◽

Vol 3 ◽

Author(s):

Roderic Page

Keyword(s):

Natural History ◽

Dna Sequences ◽

Query Language ◽

Data Entry ◽

Knowledge Graph ◽

Natural History Museums ◽

Global Knowledge ◽

Mission Creep ◽

Domain Specific ◽

Biodiversity Knowledge

This talk explores the role Wikidata (Vrandečić and Krötzsch 2014) might play in the task of assembling biodiversity information into a single, richly annotated and cross linked structure known as the biodiversity knowledge graph (Page 2016). Initially conceived as a language-independent data store of facts derived from the Wikipedia, Wikidata has morphed into a global knowledge graph, complete with a user friendly interface for data entry and a powerful implementation of the SPARQL query language. Wikidata already underpins projects such as Gene Wiki (Burgstaller-Muehlbacher et al. 2016) and Scholia (Nielsen et al. 2017). Much of the content of Wikispecies is being automatically added to Wikidata, hence many of the entities relevant to biodiversity (such as taxa, taxonomic publications, and taxonomists) well represented in Wikidata, making it even more attractive. Much of the data relevant to biodiversity is widely scattered in different locations, requiring considerable manual effort to collect and curate. Appeals to the taxonomic community to undertake these tasks have not always met with success. For example, the Global Registry of Biodiversity Repositories (GrBio) was an attempt to create a global list of biodiversity repositories, such as natural history museums and herbaria. An appeal by Schindel et al. (2016) for the taxonomic community to curate this list largely fell on deaf ears, and at the time of writing the GrBio project is moribund. Given that many repositories are housed in institutions that are the subject of articles in Wikipedia, many of these repositories already have entries in Wikidata. Hence, rather than follow the route GrBio took of building a resource and then hoping a community will assemble around that resource, we could go to Wikidata where there is an existing community and build the resource there. An impressive example of the potential for this is WikiCite, which initially had the goal of including in Wikidata every article cited in any of the Wikipedias. Taxonomic articles are highly cited in Wikipedia (Nielsen 2007), hence already fall within the remit of WikiCite. Hence Wikidata is a candidate for the “bibliography of life” (King et al. 2011), a database of all taxonomic literature. Another important role Wikidata can play is to define the boundaries of a biodiversity knowledge graph. Entities such as journals, articles, people, museums, and herbaria are often already in Wikidata, hence we can delegate managing that content to the Wikidata community (bolstered by our own contributions), and focus instead on domain-specific entities such as DNA sequences, specimens, etc., or domain specific attributes of those entities if they are already in Wikidata. This means we can avoid the inevitable “mission creep” that bedevils any attempt to link together information from multiple disciplines. These ideas are explored using examples based on content entirely within Wikidata (including entities such as publications, authorship, and natural history collections), as well as approaches that combine Wikidata with external knowledge graphs such as Ozymandias (Page 2018).

Download Full-text

Improving Access to Scientific Literature with Knowledge Graphs

BIBLIOTHEK Forschung und Praxis ◽

10.1515/bfp-2020-2042 ◽

2020 ◽

Vol 44 (3) ◽

pp. 516-529

Author(s):

Sören Auer ◽

Allard Oelen ◽

Muhammad Haris ◽

Markus Stocker ◽

Jennifer D’Souza ◽

...

Keyword(s):

Machine Intelligence ◽

Graph Representation ◽

Knowledge Graph ◽

Transfer Of Knowledge ◽

Research Knowledge ◽

Domain Specific ◽

Knowledge Graphs ◽

Classic Essay ◽

New Research

AbstractThe transfer of knowledge has not changed fundamentally for many hundreds of years: It is usually document-based-formerly printed on paper as a classic essay and nowadays as PDF. With around 2.5 million new research contributions every year, researchers drown in a flood of pseudo-digitized PDF publications. As a result research is seriously weakened. In this article, we argue for representing scholarly contributions in a structured and semantic way as a knowledge graph. The advantage is that information represented in a knowledge graph is readable by machines and humans. As an example, we give an overview on the Open Research Knowledge Graph (ORKG), a service implementing this approach. For creating the knowledge graph representation, we rely on a mixture of manual (crowd/expert sourcing) and (semi-)automated techniques. Only with such a combination of human and machine intelligence, we can achieve the required quality of the representation to allow for novel exploration and assistance services for researchers. As a result, a scholarly knowledge graph such as the ORKG can be used to give a condensed overview on the state-of-the-art addressing a particular research quest, for example as a tabular comparison of contributions according to various characteristics of the approaches. Further possible intuitive access interfaces to such scholarly knowledge graphs include domain-specific (chart) visualizations or answering of natural language questions.

Download Full-text

Mobile Software Assurance Informed through Knowledge Graph Construction: The OWASP Threat of Insecure Data Storage

Journal of Computer Science Research ◽

10.30564/jcsr.v2i2.1765 ◽

2020 ◽

Vol 2 (2) ◽

Author(s):

Suzanna Schmeelk ◽

Lixin Tao

Keyword(s):

Data Storage ◽

Program Analysis ◽

Web Application ◽

Security Analysis ◽

Knowledge Graph ◽

Healthcare Applications ◽

Sensitive Data ◽

Knowledge Graphs ◽

Mobile Malware Detection ◽

Software Assurance

Many organizations, to save costs, are movinheg to t Bring Your Own Mobile Device (BYOD) model and adopting applications built by third-parties at an unprecedented rate. Our research examines software assurance methodologies specifically focusing on security analysis coverage of the program analysis for mobile malware detection, mitigation, and prevention. This research focuses on secure software development of Android applications by developing knowledge graphs for threats reported by the Open Web Application Security Project (OWASP). OWASP maintains lists of the top ten security threats to web and mobile applications. We develop knowledge graphs based on the two most recent top ten threat years and show how the knowledge graph relationships can be discovered in mobile application source code. We analyze 200+ healthcare applications from GitHub to gain an understanding of their software assurance of their developed software for one of the OWASP top ten moble threats, the threat of “Insecure Data Storage.” We find that many of the applications are storing personally identifying information (PII) in potentially vulnerable places leaving users exposed to higher risks for the loss of their sensitive data.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Content‐based and knowledge graph‐based paper recommendation: Exploring user preferences with the knowledge graphs for scientific paper recommendation

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6227 ◽

2021 ◽

Author(s):

Hao Tang ◽

Baisong Liu ◽

Jiangbo Qian

Keyword(s):

Scientific Paper ◽

User Preferences ◽

Knowledge Graph ◽

Knowledge Graphs

Download Full-text

Development of Knowledge Graph for Data Management Related to Flooding Disasters Using Open Data

Future Internet ◽

10.3390/fi13050124 ◽

2021 ◽

Vol 13 (5) ◽

pp. 124

Author(s):

Jiseong Son ◽

Chul-Su Lim ◽

Hyoung-Seop Shim ◽

Ji-Sun Kang

Keyword(s):

Artificial Intelligence ◽

Domain Knowledge ◽

Open Data ◽

Heterogeneous Data ◽

Big Data Analysis ◽

Knowledge Graph ◽

Cross Domain ◽

Disaster Data ◽

Knowledge Graphs ◽

Open Datasets

Despite the development of various technologies and systems using artificial intelligence (AI) to solve problems related to disasters, difficult challenges are still being encountered. Data are the foundation to solving diverse disaster problems using AI, big data analysis, and so on. Therefore, we must focus on these various data. Disaster data depend on the domain by disaster type and include heterogeneous data and lack interoperability. In particular, in the case of open data related to disasters, there are several issues, where the source and format of data are different because various data are collected by different organizations. Moreover, the vocabularies used for each domain are inconsistent. This study proposes a knowledge graph to resolve the heterogeneity among various disaster data and provide interoperability among domains. Among disaster domains, we describe the knowledge graph for flooding disasters using Korean open datasets and cross-domain knowledge graphs. Furthermore, the proposed knowledge graph is used to assist, solve, and manage disaster problems.

Download Full-text

Research of Personalized Recommendation Technology Based on Knowledge Graphs

Applied Sciences ◽

10.3390/app11157104 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7104

Author(s):

Xu Yang ◽

Ziyi Huan ◽

Yisong Zhai ◽

Ting Lin

Keyword(s):

Neural Network ◽

Hot Spot ◽

Experimental Results ◽

Graph Representation ◽

Personalized Recommendation ◽

Knowledge Graph ◽

Learning Technology ◽

Data Set ◽

Model Based ◽

Knowledge Graphs

Nowadays, personalized recommendation based on knowledge graphs has become a hot spot for researchers due to its good recommendation effect. In this paper, we researched personalized recommendation based on knowledge graphs. First of all, we study the knowledge graphs’ construction method and complete the construction of the movie knowledge graphs. Furthermore, we use Neo4j graph database to store the movie data and vividly display it. Then, the classical translation model TransE algorithm in knowledge graph representation learning technology is studied in this paper, and we improved the algorithm through a cross-training method by using the information of the neighboring feature structures of the entities in the knowledge graph. Furthermore, the negative sampling process of TransE algorithm is improved. The experimental results show that the improved TransE model can more accurately vectorize entities and relations. Finally, this paper constructs a recommendation model by combining knowledge graphs with ranking learning and neural network. We propose the Bayesian personalized recommendation model based on knowledge graphs (KG-BPR) and the neural network recommendation model based on knowledge graphs(KG-NN). The semantic information of entities and relations in knowledge graphs is embedded into vector space by using improved TransE method, and we compare the results. The item entity vectors containing external knowledge information are integrated into the BPR model and neural network, respectively, which make up for the lack of knowledge information of the item itself. Finally, the experimental analysis is carried out on MovieLens-1M data set. The experimental results show that the two recommendation models proposed in this paper can effectively improve the accuracy, recall, F1 value and MAP value of recommendation.

Download Full-text