semantic knowledge graph
Recently Published Documents


TOTAL DOCUMENTS

11
(FIVE YEARS 8)

H-INDEX

3
(FIVE YEARS 1)

IEEE Software ◽  
2020 ◽  
Vol 37 (2) ◽  
pp. 89-94
Author(s):  
Bob van Luijt ◽  
Micha Verhagen

Author(s):  
Peter Grobe ◽  
Roman Baum ◽  
Philipp Bhatty ◽  
Christian Köhler ◽  
Sandra Meid ◽  
...  

The landscape of currently existing repositories of specimen data consists of isolated islands, with each applying its own underlying data model. Using standardized protocols such as DarwinCore or ABCD, specimen data and metadata are exchanged and published on web portals such as GBIF. However, data models differ across repositories. This can lead to problems when comparing and integrating content from different systems. for example, in one system there is a field with the label 'determination', in another there is a field with the label 'taxonomic identification'. Both might refer to the same concepts of organism identification process (e.g., 'obi:organism identification assay'; http://purl.obolibrary.org/obo/OBI_0001624), but the intuitive meaning of the content is not clear and the understanding of the providers of the information might differ from that of the users. Without additional information, data integration across isolated repositories is thus difficult and error-prone. As a consequence, interoperability and retrievability of data across isolated repositories is difficult. Linked Open Data (LOD) promises an improvement. URIs can be used for concepts that are ideally created and accepted by a community and that provide machine-readable meanings. LOD thereby supports transfer of data into information and then into knowledge, thus making the data FAIR (Findable, Accessible, Interoperable, Reusable; Wilkinson et al. 2016). Annotating specimen associated data with LOD, therefore, seems to be a promising approach to guarantee interoperability across different repositories. However, all currently used specimen collection management systems are based on relational database systems, which lack semantic transparency and thus do not provide easily accessible, machine-readable meanings for the terms used in their data models. As a consequence, transferring their data contents into an LOD framework may lead to loss or misinterpretation of information. This discrepancy between LOD and relational databases results from the lack of semantic transparency and machine-readability of data in relational databases. Storing specimen collection data as semantic Knowledge Graphs provides semantic transparency and machine-readability of data. Semantic Knowledge Graphs are graphs that are based on the syntax of ‘Subject – Property – Object’ of the Resource Description Framework (RDF). The ‘Subject’ and ‘Property’ position is taken by URIs and the ‘Object’ position can be taken either by a URI or by a label or value. Since a given URI can take the ‘Subject’ position in one RDF statement and the ‘Object’ position in another RDF statement, several RDF statements can be connected to form a directed labeled graph, i.e. a semantic graph. Semantic Knowledge Graphs are graphs in which each described specimen and its parts and properties possess their own URI and thus can be individually referenced. These URIs are used to describe the respective specimen and its properties using the RDF syntax. Additional RDF statements specify the ontology class that each part and property instantiates. The reference to the URIs of the instantiated ontology classes guarantees the Findability, Interoperability, and Reusability of information contained in semantic Knowledge Graphs. Specimen collection data contained in semantic Knowledge Graphs can be made Accessible in a human-readable form through an interface and in a machine-readable form through a SPARQL endpoint (https://en.wikipedia.org/wiki/SPARQL). As a consequence, semantic Knowledge Graphs comply with the FAIR guiding principles. By using URIs for the semantic Knowledge Graph of each specimen in the collection, it is also available as LOD. With semantic Morph·D·Base, we have implemented a prototype to this approach that is based on Semantic Programming. We present the prototype and discuss different aspects of how specimen collection data are handled. By using community created terminologies and standardized methods for the contents created (e.g. species identification) as well as URIs for each expression, we make the data and metadata semantically transparent and communicable. The source code for Semantic Programming and for semantic Morph·D·Base is available from https://github.com/SemanticProgramming. The prototype of semantic Morph·D·Base can be accessed here: https://proto.morphdbase.de.


Author(s):  
Lars Vogt ◽  
Sören Auer ◽  
Thomas Bartolomaeus ◽  
Pier Luigi Buttigieg ◽  
Peter Grobe ◽  
...  

We would like to present FAIR Research Data: Semantic Knowledge Graph Infrastructure for the Life Sciences (in short, FAIR.ReD), a project initiative that is currently being evaluated for funding. FAIR.ReD is a software environment for developing data management solutions according to the FAIR (Findable, Accessible, Interoperable, Reusable; Wilkinson et al. 2016) data principles. It utilizes what we call a Data Sea Storage, which employs the idea of Data Lakes to decouple data storage from data access but modifies it by storing data in a semantically structured format as either semantic graphs or semantic tables, instead of storing them in their native form. Storage follows a top-down approach, resulting in a standardized storage model, which allows sharing data across all FAIR.ReD Knowledge Graph Applications (KGAs) connected to the same Sea, with newly developed KGAs having automatically access to all contents in the Sea. In contrast access and export of data follows a bottom-up approach that allows the specification of additional data models to meet the varying domain-specific and programmatic needs for accessing structured data. The FAIR.ReD engine enables bidirectional data conversion between the two storage models and any additional data model, which will substantially reduce conversion workload for data-rich institutes (Fig. 1). Moreover, with the possibility to store data in semantic tables, FAIR.ReD provides high performance storage for incoming data streams such as sensory data. FAIR.ReD KGAs are modularly organized. Modules can be edited using the FAIR.ReD editor and combined to form coherent KGAs. The editor allows domain experts to develop their own modules and KGAs without any programming experience required, thus also allowing smaller projects and individual researchers to build their own FAIR data management solution. Contents from FAIR.ReD KGAs can be published under a Creative Commons license as documents, micropublications, or nanopublications, each receiving their own DOI. A publication-life-cycle is implemented in FAIR.ReD and allows updating published contents for corrections or additions without overwriting the originally published version. Together with the fact that data and metadata are semantically structured and machine-readable, all contents from FAIR.ReD KGAs will comply with the FAIR Guiding Principles. Due to all FAIR.Red KGAs providing access to semantic knowledge graphs in both a human-readable and a machine-readable version, FAIR.ReD seamlessly integrates the complex RDF (Resource Description Framework) world with a more intuitively comprehensible presentation of data in form of data entry forms, charts, and tables. Guided by use cases, the FAIR.ReD environment will be developed using semantic programming where the source code of an application is stored in its own ontology. The set of source code ontologies of a KGA and its modules provides the steering logic for running the KGA. With this clear separation of steering logic from interpretation logic, semantic programming follows the idea of separating main layers of an application, analog to the separation of interpretation logic and presentation logic. Each KGA and module is specified exactly in this way and their source code ontologies stored in the Data Sea. Thus, all data and metadata are semantically transparent and so is the data management application itself, which substantially improves their sustainability on all levels of data processing and storing.


Sign in / Sign up

Export Citation Format

Share Document