triple store
Recently Published Documents


TOTAL DOCUMENTS

52
(FIVE YEARS 16)

H-INDEX

6
(FIVE YEARS 1)

2022 ◽  
Vol 11 (1) ◽  
pp. 51
Author(s):  
Alexandra Rowland ◽  
Erwin Folmer ◽  
Wouter Beek ◽  
Rob Wenneker

Kadaster, the Dutch National Land Registry and Mapping Agency, has been actively publishing their base registries as linked (open) spatial data for several years. To date, a number of these base registers as well as a number of external datasets have been successfully published as linked data and are publicly available. Increasing demand for linked data products and the availability of new linked data technologies have highlighted the need for a new, innovative approach to linked data publication within the organisation in the interest of reducing the time and costs associated with said publication. The new approach to linked data publication is novel in both its approach to dataset modelling, transformation, and publication architecture. In modelling whole datasets, a clear distinction is made between the Information Model and the Knowledge Model to capture both the organisation-specific requirements and to support external, community standards in the publication process. The publication architecture consists of several steps where instance data are loaded from their source as GML and transformed using an Enhancer and published in the triple store. Both the modelling and publication architecture form part of Kadaster’s larger vision for the development of the Kadaster Knowledge Graph through the integration of the various linked datasets.


Author(s):  
Laura Pandolfo ◽  
Luca Pulina

Using semantic web technologies is becoming an efficient way to overcome metadata storage and data integration problems in digital archives, thus enhancing the accuracy of the search process and leading to the retrieval of more relevant results. In this paper, the results of the implementation of the semantic layer of the Józef Piłsudski Institute of America digital archive are presented. In order to represent and integrate data about the archival collections housed by the institute, the authors developed arkivo, an ontology that accommodates the archival description of records but also provides a reference schema for publishing linked data. The authors describe the application of arkivo to the digitized archival collections of the institute, with emphasis on how these resources have been linked to external datasets in the linked data cloud. They also show the results of an experiment focused on the query answering task involving a state-of-the-art triple store system. The dataset related to the Piłsudski Institute archival collections has been made available for ontology benchmarking purposes.


Author(s):  
Reto Gmür ◽  
Donat Agosti

Taxonomic treatments, sections of publications documenting the features or distribution of a related group of organisms (called a “taxon”, plural “taxa”) in ways adhering to highly formalized conventions, and published in scientific journals, shape our understanding of global biodiversity (Catapano 2019). Treatments are the building blocks of the evolving scientific consensus on taxonomic entities. The semantics of these treatments and their relationships are highly structured: taxa are introduced, merged, made obsolete, split, renamed, associated with specimens and so on. Plazi makes this content available in machine-readable form using Resource Description Framework (RDF) . RDF is the standard model for Linked Data and the Semantic Web. RDF can be exchanged in different formats (aka concrete syntaxes) such as RDF/XML or Turtle. The data model describes graph structures and relies on Internationalized Resource Identifiers (IRIs) , ontologies such as Darwin Core basic vocabulary are used to assign meaning to the identifiers. For Synospecies, we unite all treatments into one large knowledge graph, modelling taxonomic knowledge and its evolution with complete references to quotable treatments. However, this knowledge graph expresses much more than any individual treatment could convey because every referenced entity is linked to every other relevant treatment. On synospecies.plazi.org, we provide a user-friendly interface to find the names and treatments related to a taxon. An advanced mode allows execution of queries using the SPARQL query language.


2021 ◽  
Author(s):  
Hashim Khan ◽  
Manzoor Ali ◽  
Axel-Cyrille Ngonga Ngomo ◽  
Muhammad Saleem

With significant growth in RDF datasets, application developers demand online availability of these datasets to meet the end users’ expectations. Various interfaces are available for querying RDF data using SPARQL query language. Studies show that SPARQL end-points may provide high query runtime performance at the cost of low availability. For example, it has been observed that only 32.2% of public endpoints have a monthly uptime of 99–100%. One possible reason for this low availability is the high workload experienced by these SPARQL endpoints. As complete query execution is performed at server side (i.e., SPARQL endpoint), this high query processing workload may result in performance degradation or even a service shutdown. We performed extensive experiments to show the query processing capabilities of well-known triple stores by using their SPARQL endpoints. In particular, we stressed these triple stores with multiple parallel requests from different querying agents. Our experiments revealed the maximum query processing capabilities of these triple stores after which point they lead to service shutdowns. We hope this analysis will help triple store developers to design workload-aware RDF engines to improve the availability of their public endpoints with high throughput.


Author(s):  
Ismaël Rafaï ◽  
Sébastien Duchêne ◽  
Eric Guerci ◽  
Irina Basieva ◽  
Andrei Khrennikov

2021 ◽  
Author(s):  
Jørgen Tulstrup ◽  
Andrej Vihtelič ◽  
Ángel Prieto Martín ◽  
Martin Schiegl ◽  
Dana Čápová ◽  
...  

<p>The GeoERA Information Platform Project (GIP-P) is establishing a new common platform for organising, disseminating and sustaining digital harmonised data from the GeoERA geoscientific projects on subsurface energy, water and raw material resources from all over Europe. The Platform is part of the EuroGeoSurveys’ European Geological Data Infrastructure (EGDI), which takes care of its future sustainability.</p><p>Great efforts are being put into making the data Findable, Accessible, Interoperable and Reusable (FAIR) at one place and thereby be as valuable as possible for the stakeholders.</p><p>The Platform consists of web applications, services, databases, digital repository, opensource tools and services that connect all together. The main components are:</p><p>EGDI Metadata Catalogue: It is based on the MIcKA system for management and publication of metadata on structured data and services. This technology enables entry, editing, harvesting, discovery, and view of metadata, including tools for compilation and export of metadata in standardized formats.</p><p>EGDI Central Database: Stores structured data for other main components in PostgreSQL. It contains geospatial data as well as configuration of various maps, layers, datasets uploaded by individual projects, metadata for unstructured documents and DOI-links, search system data, 3D models data, etc.</p><p>European Geoscience Registry: Stores and publishes controlled vocabularies for the whole GeoERA program which supports multilingual semantic text search and project specific knowledge concepts. It is a Linked Data Platform based on Jena RDF triple store.</p><p>Digital Archive: Stores unstructured data (PDF, DOI document links, images or comma separated files) and their metadata. It consists of a controlled part of the filesystem, the Solr search engine, the repository database and the thematic repository search web application that enables the user to perform an advanced search through the repository with auto-suggested terms from RDF triple store and ranked results. </p><p>Search System: Allows to discover geoscientific information from all of the above modules. Through a powerful search logic it allows to find datasets, see their metadata, access their available online distributions (specific representations of a dataset) and also select and display subsets of features from those datasets. Search results are ordered according to relevance.</p><p>The platform also contains modules for harvesting data in INSPIRE compliant formats, for disseminating geospatial data on web GIS, a 3D geological model viewer and modules for uploading data and for monitoring services.</p><p>The presentation will focus on describing the various components and their individual functions and in particular on how the search system interacts with all of them. This makes it possible to get useful results from the very diverse data sources from the system.</p>


2021 ◽  
Author(s):  
Alex Vermeulen ◽  
Margareta Hellström ◽  
Oleg Mirzov ◽  
Ute Karstens ◽  
Claudio D'Onofrio ◽  
...  

<p>The Integrated Carbon Observation System (ICOS) provides long term, high quality observations that follow (and cooperatively set) the global standards for the best possible quality data on the atmospheric composition for greenhouse gases (GHG), greenhouse gas exchange fluxes measured by eddy covariance and CO<sub>2</sub> partial pressure at water surfaces. The ICOS observational data feeds into a wide area of science that covers for example plant physiology, agriculture, biology, ecology, energy & fuels, forestry, hydrology, (micro)meteorology, environmental, oceanography, geochemistry, physical geography, remote sensing, earth-, climate-, soil- science and combinations of these in multi-disciplinary projects.<br>As ICOS is committed to provide all data and methods in an open and transparent way as free data, a dedicated system is needed to secure the long term archiving and availability of the data together with the descriptive metadata that belongs to the data and is needed to find, identify, understand and properly use the data, also in the far future, following the FAIR data principles. An added requirement is that the full data lifecycle should be completely reproducible to enable full trust in the observations and the derived data products.</p><p>In this presentation we will introduce the ICOS operational data repository named ICOS Carbon Portal that is based on the linked open data approach. All metadata is modelled in an ontology coded in OWL and based on a RDF triple store that is available through an open SparQL endpoint. The repository supports versioning, collections and models provenance through a simplified Prov-O ontology. All data objects are ingested under strict control for the identified data types on provision of the correct and sufficient (provenance) metadata, data format and data integrity. All data, including raw data, is stored in the long term trusted repository  B2SAFE with two replicates. On top of the triple store and SparQL endpoint we have built a series of services, APIs and graphical interfaces that allow machines to machine and user interaction with the data and metadata. Examples are a full faceted search with connected data cart and download facility, preview of higher level data products (time series of  point observations and spatial data), and cloud computing services like eddy covariance data processing and on demand atmospheric footprint calculations, all connected to the observational data from ICOS.  Another interesting development is the community support for scientific workflows using Jupyter notebook services that connect to our repository through a dedicated python library for direct metadata and data access.</p>


Author(s):  
Maarten Trekels ◽  
Matt Woodburn ◽  
Deborah L Paul ◽  
Sharon Grant ◽  
Kate Webbink ◽  
...  

Data standards allow us to aggregate, compare, compute and communicate data from a wide variety of origins. However, for historical reasons, data are most likely to be stored in many different formats and conform to different models. Every data set might contain a huge amount of information, but it becomes tremendously difficult to compare them without a common way to represent the data. That is when standards development jumps in. Developing a standard is a formidable process, often involving many stakeholders. Typically the initial blueprint of a standard is created by a limited number of people who have a clear view of their use cases. However, as development continues, additional stakeholders participate in the process. As a result, conflicting opinions and interests will influence the development of the standard. Compromises need to be made and the standard might look very different from the initial concept. In order to address the needs of the community, a high level of engagement in the development process is encouraged. However, this does not necessarily increase the usability of the standard. To mitigate this, there is a need to test the standard during the early stages of development. In order to facilitate this, we explored the use of Wikibase to create an initial implementation of the standard. Wikibase is the underlying technology that drives Wikidata. The software is open-source and can be customized for creating collaborative knowledge bases. In addition to containing an RDF (Resource Description Framework) triple store under the hood, it provides users with an easy-to-use graphical user interface (see Fig. 1). This facilitates the use of an implementation of a standard by non-technical users. The Wikibase remains fully flexible in the way data are represented and no data model is enforced. This allows users to map their data onto the standard without any restrictions. Retrieving information from RDF data can be done through the SPARQL query language (W3C 2020). The software package has also a built-in SPARQL endpoint, allowing users to extract the relevant information: Does the standard cover all use cases envisioned? Are parts of the standard underdeveloped? Are the controlled vocabularies sufficient to describe the data? Does the standard cover all use cases envisioned? Are parts of the standard underdeveloped? Are the controlled vocabularies sufficient to describe the data? This strategy was applied during the development of the TDWG Collection Description standard. After completing a rough version of the standard, the different terms that were defined in the first version were transferred to a Wikibase instance running on WBStack (Addshore 2020). Initially, collection data were entered manually, which revealed several issues. The Wikibase allowed us to easily define controlled vocabularies and expand them as needed. The feedback reported from users then flowed back to the further development of the standard. Currently we envisage creating automated scripts that will import data en masse from collections. Using the SPARQL query interface, it will then be straightforward to ensure that data can be extracted from the Wikibase to support the envisaged use cases.


Author(s):  
Mariya Dimitrova ◽  
Jorrit Poelen ◽  
Georgi Zhelezov ◽  
Teodor Georgiev ◽  
Donat Agosti ◽  
...  

Introduction Scholarly literature is the primary source for biodiversity knowledge based on observations, field work, analysis and taxonomic classification. Publishing such literature in semantically enhanced formats (e.g., through Extensible Markup Language (XML) tagging) helps to make this knowledge easily accessible and available to humans and actionable by computers. A recent collaboration between Pensoft Publishers and Global Biotic Interactions (GloBI) (Poelen et al. 2014) demonstrates how semantically published literature can be used to extract species interactions from tables published in the article narratives (Dimitrova et al. 2020) (Fig. 1). Methods Biotic interactions were extracted from scholarly literature tables published in several biodiversity journals from Pensoft. Semantically enhanced publications were processed to extract the tables from the article XMLs. There were 6993 tables from 21 different journals. Using the Pensoft Annotator, a text-to-ontology mapping tool, we were able to detect tables that could contain biotic interactions. The Pensoft Annotator was used together with a modified subset of the OBO Foundry Relation Ontology (RO), concentrating on the term labeled ‘biotically interacts with’ and all its children. The contents and captions of all tables were run through the Pensoft Annotator, which returned the matching ontology terms and their position in the text. The resulting subset of tables was then processed by GloBI, which parsed the tables to extract the taxonomic names participating in each interaction. The GloBI workflow also generated table citations by SPARQL queries to the OpenBiodiv triple store where all table and article metadata are stored (Penev et al. 2019). OpenBiodiv was also used as a taxon name knowledge base to expand the taxon hierarchy in the tables and to guide the merging of overlapping taxon hierarchies in a single row (e.g., host plant family + host plant species -> host plant species). Taxon name resolution of species interactions was done under the assumption that two non-overlapping taxa are found in a single column. The exact interaction types between the species were not determined, instead the general term labelled “interacts with” was used. Results Annotation of biotic interactions via the Pensoft Annotator helped to identify 233 tables possibly containing biotic interactions out of the 6993 tables that were processed. Semantic annotation of taxonomic names within tables allowed GloBI to index the species including their complete taxonomic hierarchies. Currently, GloBI has indexed 2378 interactions, extracted from a subset of 46 of the 233 tables. Interactions extracted via this workflow are available on a special webpage on GloBI's website. Records of the communication behind this collaborative work between GloBI and Pensoft are publically available. Discussion & Conclusion One of the limitations of the workflow was the inability to detect the directionality of the interactions. In other words, the tables do not contain information about the subject and object of a given interaction. For instance, in a host-parasite interaction, we can not automatically detect which species is the host and which is the parasite. We plan to address this issue by performing semantic analysis (e.g., part-of-speech tagging) of the table captions to determine the exact subjects and objects in the interactions. In addition, complicated table structures impeded both the processing of tables by the Pensoft Annotator and their parsing by GloBI’s algorithms. We recognise the importance of adopting common formats for sharing interaction data, a practice that would greatly improve the post-publication indexing of tables by GloBI. An example of a standardised table structure is the standard table template for primary biodiversity data, introduced by Pensoft (Penev et al. 2020). The template helps authors create semantically enhanced tables, which in turn enables direct harvesting and conversion to interlinked FAIR (Findable, Accessible, Interoperable, and Reusable) data. Indexing of biotic interactions by GloBI and Pensoft demonstrates the advantages of storing semantically enhanced data in tables. The adoption of the standard appendix table for primary biodiversity data would improve our ability to extract biotic interactions and to transform scholarly narrative into fully interoperable Linked Open Data.


Sign in / Sign up

Export Citation Format

Share Document