scholarly journals Synospecies, an application to reflect changes in taxonomic names based on a triple store based on taxonomic data liberated from publication 

Author(s):  
Reto Gmür ◽  
Donat Agosti

Taxonomic treatments, sections of publications documenting the features or distribution of a related group of organisms (called a “taxon”, plural “taxa”) in ways adhering to highly formalized conventions, and published in scientific journals, shape our understanding of global biodiversity (Catapano 2019). Treatments are the building blocks of the evolving scientific consensus on taxonomic entities. The semantics of these treatments and their relationships are highly structured: taxa are introduced, merged, made obsolete, split, renamed, associated with specimens and so on. Plazi makes this content available in machine-readable form using Resource Description Framework (RDF) . RDF is the standard model for Linked Data and the Semantic Web. RDF can be exchanged in different formats (aka concrete syntaxes) such as RDF/XML or Turtle. The data model describes graph structures and relies on Internationalized Resource Identifiers (IRIs) , ontologies such as Darwin Core basic vocabulary are used to assign meaning to the identifiers. For Synospecies, we unite all treatments into one large knowledge graph, modelling taxonomic knowledge and its evolution with complete references to quotable treatments. However, this knowledge graph expresses much more than any individual treatment could convey because every referenced entity is linked to every other relevant treatment. On synospecies.plazi.org, we provide a user-friendly interface to find the names and treatments related to a taxon. An advanced mode allows execution of queries using the SPARQL query language.

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 1677
Author(s):  
Toshiaki Katayama ◽  
Shuichi Kawashima ◽  
Gos Micklem ◽  
Shin Kawano ◽  
Jin-Dong Kim ◽  
...  

Publishing databases in the Resource Description Framework (RDF) model is becoming widely accepted to maximize the syntactic and semantic interoperability of open data in life sciences. Here we report advancements made in the 6th and 7th annual BioHackathons which were held in Tokyo and Miyagi respectively. This review consists of two major sections covering: 1) improvement and utilization of RDF data in various domains of the life sciences and 2) meta-data about these RDF data, the resources that store them, and the service quality of SPARQL Protocol and RDF Query Language (SPARQL) endpoints. The first section describes how we developed RDF data, ontologies and tools in genomics, proteomics, metabolomics, glycomics and by literature text mining. The second section describes how we defined descriptions of datasets, the provenance of data, and quality assessment of services and service discovery. By enhancing the harmonization of these two layers of machine-readable data and knowledge, we improve the way community wide resources are developed and published.  Moreover, we outline best practices for the future, and prepare ourselves for an exciting and unanticipatable variety of real world applications in coming years.


2021 ◽  
Vol 50 (02) ◽  
Author(s):  
TẠ DUY CÔNG CHIẾN

There are many applications related to semantic web, information retrieval, information extraction, and question answering applying ontologies in recent years. To avoid the conceptual and terminological confusion, an ontology is built as a taxonomy ontology which identifies and distinguishes concepts as well as terminology. It accomplishes this by specifying a set of generic concepts that characterizes the domain as well as their definitions and interrelationships. There are some methods to represent ontologies, such as Resource Description Framework (RDF), Web Ontology Language (OWL), databases etc. depending on the characteristic of data. RDF, OWL usually is used the cases when data structure is objects which the relationship among the objects is simple. But if the relationship among the objects is more complex, using databases for storing ontologies is an approach to be better. However, using relational databases do not sufficiently support the semantic orientated search by Structured Query Language (SQL) and the searching speed is slow. Therefore, this paper introduces an approach to extending query sentences for semantic oriented search on knowledge graph.


Lung Cancer is the second most recurrent cancer in both men and women and which is the leading cause of cancer death worldwide. The American cancer Society (ACS) in US estimates nearly 228,150 new cases of lung cancer and 142,670 deaths from lung cancer for the year 2019. This paper proposes to build an ontology based expert system to diagnose Lung Cancer Disease and to identify the stage of Lung Cancer. Ontology is defined as a specification of conceptualization and describes knowledge about any domain in the form of concepts and relationships among them. It is a framework for representing shareable and reusable knowledge across a domain. The advantage of using ontology for knowledge representation of a particular domain is they are machine readable. We designed a System named OBESLC (Ontology Based Expert System for Lung Cancer) for lung cancer diagnosis, in that to construct an ontology we make use of Ontology Web Language (OWL) and Resource Description Framework (RDF) .The design of this system depends on knowledge about patient’s symptoms and the state of lung nodules to build knowledge base of Lung Cancer Disease. We verified our ontology OBESLC by querying it using SPARQL query language, a popular query language for extracting required information from Semantic web. We validate our ontology by developing reasoning rules using semantic Web Rule Language (SWRL).To provide the user interface, we implemented our approach in java using Jena API and Eclipse Editor.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 881
Author(s):  
Sini Govindapillai ◽  
Lay-Ki Soon ◽  
Su-Cheng Haw

Knowledge graph (KG) publishes machine-readable representation of knowledge on the Web. Structured data in the knowledge graph is published using Resource Description Framework (RDF) where knowledge is represented as a triple (subject, predicate, object). Due to the presence of erroneous, outdated or conflicting data in the knowledge graph, the quality of facts cannot be guaranteed. Therefore, the provenance of knowledge can assist in building up the trust of these knowledge graphs. In this paper, we have provided an analysis of popular, general knowledge graphs Wikidata and YAGO4 with regard to the representation of provenance and context data. Since RDF does not support metadata for providing provenance and contextualization, an alternate method, RDF reification is employed by most of the knowledge graphs. Trustworthiness of facts in knowledge graph can be enhanced by the addition of metadata like the source of information, location and time of the fact occurrence. Wikidata employs qualifiers to include metadata to facts, while YAGO4 collects metadata from Wikidata qualifiers. RDF reification increases the magnitude of data as several statements are required to represent a single fact. However, facts in Wikidata and YAGO4 can be fetched without using reification. Another limitation for applications that uses provenance data is that not all facts in these knowledge graphs are annotated with provenance data. Structured data in the knowledge graph is noisy. Therefore, the reliability of data in knowledge graphs can be increased by provenance data. To the best of our knowledge, this is the first paper that investigates the method and the extent of the addition of metadata of two prominent KGs, Wikidata and YAGO4.


Author(s):  
G. Hiebel ◽  
K. Hanke

The ancient mining landscape of Schwaz/Brixlegg in the Tyrol, Austria witnessed mining from prehistoric times to modern times creating a first order cultural landscape when it comes to one of the most important inventions in human history: the production of metal. In 1991 a part of this landscape was lost due to an enormous landslide that reshaped part of the mountain. With our work we want to propose a digital workflow to create a 3D semantic representation of this ancient mining landscape with its mining structures to preserve it for posterity. First, we define a conceptual model to integrate the data. It is based on the CIDOC CRM ontology and CRMgeo for geometric data. To transform our information sources to a formal representation of the classes and properties of the ontology we applied semantic web technologies and created a knowledge graph in RDF (Resource Description Framework). Through the CRMgeo extension coordinate information of mining features can be integrated into the RDF graph and thus related to the detailed digital elevation model that may be visualized together with the mining structures using Geoinformation systems or 3D visualization tools. The RDF network of the triple store can be queried using the SPARQL query language. We created a snapshot of mining, settlement and burial sites in the Bronze Age. The results of the query were loaded into a Geoinformation system and a visualization of known bronze age sites related to mining, settlement and burial activities was created.


2021 ◽  
Vol 13 (13) ◽  
pp. 2511
Author(s):  
Xuejie Hao ◽  
Zheng Ji ◽  
Xiuhong Li ◽  
Lizeyan Yin ◽  
Lu Liu ◽  
...  

With the development and improvement of modern surveying and remote-sensing technology, data in the fields of surveying and remote sensing have grown rapidly. Due to the characteristics of large-scale, heterogeneous and diverse surveys and the loose organization of surveying and remote-sensing data, effectively obtaining information and knowledge from data can be difficult. Therefore, this paper proposes a method of using ontology for heterogeneous data integration. Based on the heterogeneous, decentralized, and dynamic updates of large surveying and remote-sensing data, this paper constructs a knowledge graph for surveying and remote-sensing applications. First, data are extracted. Second, using the ontology editing tool Protégé, a knowledge graph mode level is constructed. Then, using a relational database, data are stored, and a D2RQ tool maps the data from the mode level’s ontology to the data layer. Then, using the D2RQ tool, a SPARQL protocol and resource description framework query language (SPARQL) endpoint service is used to describe functions such as query and reasoning of the knowledge graph. The graph database is then used to display the knowledge graph. Finally, the knowledge graph is used to describe the correlation between the fields of surveying and remote sensing.


2013 ◽  
Vol 07 (04) ◽  
pp. 455-477 ◽  
Author(s):  
EDGARD MARX ◽  
TOMMASO SORU ◽  
SAEEDEH SHEKARPOUR ◽  
SÖREN AUER ◽  
AXEL-CYRILLE NGONGA NGOMO ◽  
...  

Over the last years, a considerable amount of structured data has been published on the Web as Linked Open Data (LOD). Despite recent advances, consuming and using Linked Open Data within an organization is still a substantial challenge. Many of the LOD datasets are quite large and despite progress in Resource Description Framework (RDF) data management their loading and querying within a triple store is extremely time-consuming and resource-demanding. To overcome this consumption obstacle, we propose a process inspired by the classical Extract-Transform-Load (ETL) paradigm. In this article, we focus particularly on the selection and extraction steps of this process. We devise a fragment of SPARQL Protocol and RDF Query Language (SPARQL) dubbed SliceSPARQL, which enables the selection of well-defined slices of datasets fulfilling typical information needs. SliceSPARQL supports graph patterns for which each connected subgraph pattern involves a maximum of one variable or Internationalized resource identifier (IRI) in its join conditions. This restriction guarantees the efficient processing of the query against a sequential dataset dump stream. Furthermore, we evaluate our slicing approach on three different optimization strategies. Results show that dataset slices can be generated an order of magnitude faster than by using the conventional approach of loading the whole dataset into a triple store.


Author(s):  
Maarten Trekels ◽  
Matt Woodburn ◽  
Deborah L Paul ◽  
Sharon Grant ◽  
Kate Webbink ◽  
...  

Data standards allow us to aggregate, compare, compute and communicate data from a wide variety of origins. However, for historical reasons, data are most likely to be stored in many different formats and conform to different models. Every data set might contain a huge amount of information, but it becomes tremendously difficult to compare them without a common way to represent the data. That is when standards development jumps in. Developing a standard is a formidable process, often involving many stakeholders. Typically the initial blueprint of a standard is created by a limited number of people who have a clear view of their use cases. However, as development continues, additional stakeholders participate in the process. As a result, conflicting opinions and interests will influence the development of the standard. Compromises need to be made and the standard might look very different from the initial concept. In order to address the needs of the community, a high level of engagement in the development process is encouraged. However, this does not necessarily increase the usability of the standard. To mitigate this, there is a need to test the standard during the early stages of development. In order to facilitate this, we explored the use of Wikibase to create an initial implementation of the standard. Wikibase is the underlying technology that drives Wikidata. The software is open-source and can be customized for creating collaborative knowledge bases. In addition to containing an RDF (Resource Description Framework) triple store under the hood, it provides users with an easy-to-use graphical user interface (see Fig. 1). This facilitates the use of an implementation of a standard by non-technical users. The Wikibase remains fully flexible in the way data are represented and no data model is enforced. This allows users to map their data onto the standard without any restrictions. Retrieving information from RDF data can be done through the SPARQL query language (W3C 2020). The software package has also a built-in SPARQL endpoint, allowing users to extract the relevant information: Does the standard cover all use cases envisioned? Are parts of the standard underdeveloped? Are the controlled vocabularies sufficient to describe the data? Does the standard cover all use cases envisioned? Are parts of the standard underdeveloped? Are the controlled vocabularies sufficient to describe the data? This strategy was applied during the development of the TDWG Collection Description standard. After completing a rough version of the standard, the different terms that were defined in the first version were transferred to a Wikibase instance running on WBStack (Addshore 2020). Initially, collection data were entered manually, which revealed several issues. The Wikibase allowed us to easily define controlled vocabularies and expand them as needed. The feedback reported from users then flowed back to the further development of the standard. Currently we envisage creating automated scripts that will import data en masse from collections. Using the SPARQL query interface, it will then be straightforward to ensure that data can be extracted from the Wikibase to support the envisaged use cases.


2020 ◽  
Vol 1 (1) ◽  
pp. 39-69
Author(s):  
Maria Krommyda

Widely accepted standards, such as the Resource Description Framework, have provided unified ways for data provision aiming to facilitate the exchange of information between machines. This information became of interest to a wider audience due to its volume and variety but the available formats are posing significant challenges to users with limited knowledge of the Semantic Web. The SPARQL query language alleviates this barrier by facilitating the exploration of this information and many data providers have created dedicated SPARQL endpoints for their data. Many efforts have been dedicated to the development of systems that will provide access and support the exploration of these endpoints in a semantically correct and user friendly way. The main challenge of such approaches is the diversity of the information contained in the endpoints, which renders holistic or schema specific solutions obsolete. We present here an integrated platform that supports the users to the querying, exploration and visualization of information contained in SPARQL endpoints. The platform handles each query result independently based only on its characteristics, offering an endpoint and data schema agnostic solution. This is achieved through a Decision Support System, developed based on a knowledge base containing information experimentally collected from many endpoints, that allows us to provide case-specific visualization strategies for SPARQL query results based exclusively on features extracted from the result.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 881
Author(s):  
Sini Govindapillai ◽  
Lay-Ki Soon ◽  
Su-Cheng Haw

Knowledge graph (KG) publishes machine-readable representation of knowledge on the Web. Structured data in the knowledge graph is published using Resource Description Framework (RDF) where knowledge is represented as a triple (subject, predicate, object). Due to the presence of erroneous, outdated or conflicting data in the knowledge graph, the quality of facts cannot be guaranteed. Trustworthiness of facts in knowledge graph can be enhanced by the addition of metadata like the source of information, location and time of the fact occurrence. Since RDF does not support metadata for providing provenance and contextualization, an alternate method, RDF reification is employed by most of the knowledge graphs. RDF reification increases the magnitude of data as several statements are required to represent a single fact. Another limitation for applications that uses provenance data like in the medical domain and in cyber security is that not all facts in these knowledge graphs are annotated with provenance data. In this paper, we have provided an overview of prominent reification approaches together with the analysis of popular, general knowledge graphs Wikidata and YAGO4 with regard to the representation of provenance and context data. Wikidata employs qualifiers to include metadata to facts, while YAGO4 collects metadata from Wikidata qualifiers. However, facts in Wikidata and YAGO4 can be fetched without using reification to cater for applications that do not require metadata. To the best of our knowledge, this is the first paper that investigates the method and the extent of metadata covered by two prominent KGs, Wikidata and YAGO4.


Sign in / Sign up

Export Citation Format

Share Document