scholarly journals Multi-context Knowledge Base using Calculated Descriptors from Xper3: the Archaeocyaths Knowledge Base example

Author(s):  
Adeline Kerner ◽  
Régine Vignes Lebbe

Natural sciences need to make assertions about characteristics of taxa. Traits and qualities or descriptors and states become increasingly crucial as a resource for identification adapted to both scientists and the public. Specialists, non-specialists, and the general public need different strategies for accessing the information. Creating a knowledge base is time-consuming and adapting this base to several needs seems to increase the required time substantially. Specialists think that an identification tool requires a complete overhaul when they want to change the target audience or the language... so they do not wish to get involved. How to minimize creation and update time? How to aggregate data only once in order to create a single knowledge base with different access levels? Strategies for integrating different patterns of descriptors in a single knowledge base are useful to modulate descriptions without loss of information and with the certainty that everything is up to date in each context. This multi-context knowledge base, derived from a single trait dataset, can generate descriptions adapted to different contexts and users. To address this issue, we propose to use calculated descriptors in order to have on the same knowledge base, different versions of descriptors that are updated automatically when the reference trait is modified. Calculated descriptors are a distinctive feature of Xper3. These descriptors are automatically computed from other descriptors by using logical operators (boolean operators). Xper3 (http://www.xper3.com) is a web platform that manages descriptive data and provides interactive identification keys. Xper3 and its previous version, Xper2, have been used already for various taxonomic groups. We will focus on fossils in order to reveal how calculated descriptors in Xper3 knowledge bases can solve the multi-context problem. The main source of content is the archaeocyaths knowledge base (http://archaeocyatha.infosyslab.fr). Archaeocyaths are the first animals to build reefs during the Cambrian. They are calcified sponges without spicules. The archaeocyaths knowledge base is an efficient resource for scientific studies and a useful tool for non-specialists, especially with the support of calculated descriptors. Correspondence between archaeocyath and sponge morphologies is not ready yet, but everything will be included into PORO (The Porifera Ontology, http://purl.obolibrary.org/obo/poro/releases/2014-03-06/) in the short term (an anatomy ontology about sponges). In this knowledge base, calculated descriptors are used to: create a consistent multilingual interactive identification key (French and English are available and Russian is in draft), generate descriptors adapted to different level of expertise and reword morphological descriptors (adapted for identification) into homologous characters (adapted for phylogeny). create a consistent multilingual interactive identification key (French and English are available and Russian is in draft), generate descriptors adapted to different level of expertise and reword morphological descriptors (adapted for identification) into homologous characters (adapted for phylogeny). Xper2 and Xper3 are compatible with TDWG’s Structured Descriptive Data (SDD) format. Calculated descriptors do not exist in SDD format and so they are exported from Xper3 as categorical descriptors, therefore losing the origin of values. Calculated descriptors are powerful and we are interested in discussing these with SDD and Xper3 users in order to improve the user interface and develop new tools for the analysis of such descriptors.

Author(s):  
Adeline Kerner ◽  
Sylvain Bouquin ◽  
Rémy Portier ◽  
Régine Vignes Lebbe

The Xper3 platform was launched in November 2013 (Saucède et al. 2020). Xper3 is a free web platform that manages descriptive data and provides interactive identification keys. It is a follow-up to Xper (Forget et al. 1986) and Xper2 (Ung et al. 2010). Xper3 is used via web browsers. It offers a collaborative, multi-user interface without local installation. It is compatible with TDWG’s Structured Descriptive Data (SDD) format. Xper3 and its previous version, Xper2, have already been used for various taxonomic groups. In June 2021, 4743 users had created accounts and edited 5756 knowledge bases. Each knowledge base is autonomous and can be published as a free access key link, as a data paper in publications or on websites. The risk of this autonomy and lack of visibility to already existing knowlege bases is possible duplicated content or overlapping effort. Increasingly, users have asked for a public overview of the existing content. A first version of a searching tool is now available online. Explorer lists the databases whose creators have filled in the extended metadata and have accepted the referencing. The user can search by language, taxonomic group, fossil or current, geography, habitat, and key words. New developments of Xper3 are in progress. Some have a first version online, others are in production and the last ones are future projects. We will present an overview of the different projects in progress and for the future. Calculated descriptors are a distinctive feature of Xper3 (Kerner and Vignes Lebbe 2019). These descriptors are automatically computed from other descriptors by using logical operators (Boolean operators). The use of calculated descriptors remains rare. It is necessary to put forward the calculated descriptors to encourage more feedback in order to improve them. The link between Xper3 and Annotate continues to improve (Hays and Kerner 2020). Annotate offers the possibility of tagging images with controlled vocabularies structured in Xper3. Then, an export from Annotate to Xper3, allows automatic filling in of the Xper3 knowledge base with the descriptions (annotations and numerical measures) of virtual specimens, and then comparing specimens to construct species descriptions, etc. Future developments are in progress that will modify the Xper3 architecture in order to have the same functionalities in both local and online versions and to allow various user interfaces from the same knowledge bases. Xper2-specific features, such as merging states, adding notes, adding definitions and/or illustrations in the description tab, having different ways of sorting and filtering the descriptors during an identification (by groups, identification power, alphabetic order, specialist’s choice) have to be added to Xper3. A new tab in Xper3’s interface is being implemented to give an access to various analysis tools, via API (Application Programming Interface), or R programming code: MINSET: minimum list of descriptors sufficient to discriminate all items MINDESCR: minimum set of descriptors to discriminate an item DESCRXP: generating a description in natural language MERGEMOD: proposing to merge states without loss of discriminating power DISTINXP, DISTVAXP: computing similarities between items or descriptors MINSET: minimum list of descriptors sufficient to discriminate all items MINDESCR: minimum set of descriptors to discriminate an item DESCRXP: generating a description in natural language MERGEMOD: proposing to merge states without loss of discriminating power DISTINXP, DISTVAXP: computing similarities between items or descriptors One last project that we would like to implement is an interoperability between Xper3, platforms with biodiversity data (e.g., Global Biodiversity Information Facility, GBIF) and bio-ontologies. An ID field already exists to add Universally Unique IDentifiers (UUID) for taxa. ID fields have to be added for descriptors and states to link them with ontologies e.g., Phenotypic Quality Ontology PATO, Plant Ontology PO. We are interested in discussing future developments to further improve the user interface and develop new tools for the analysis of knowledge bases.


2020 ◽  
Author(s):  
Matheus Pereira Lobo

This paper is about highlighting two categories of knowledge bases, one built as a repository of links, and other based on units of knowledge.


2018 ◽  
Vol 2 ◽  
pp. e25614 ◽  
Author(s):  
Florian Pellen ◽  
Sylvain Bouquin ◽  
Isabelle Mougenot ◽  
Régine Vignes-Lebbe

Xper3 (Vignes Lebbe et al. 2016) is a collaborative knowledge base publishing platform that, since its launch in november 2013, has been adopted by over 2 thousand users (Pinel et al. 2017). This is mainly due to its user friendly interface and the simplicity of its data model. The data are stored in MySQL Relational DBs, but the exchange format uses the TDWG standard format SDD (Structured Descriptive DataHagedorn et al. 2005). However, each Xper3 knowledge base is a closed world that the author(s) may or may not share with the scientific community or the public via publishing content and/or identification key (Kopfstein 2016). The explicit taxonomic, geographic and phenotypic limits of a knowledge base are not always well defined in the metadata fields. Conversely terminology vocabularies, such as Phenotype and Trait Ontology PATO and the Plant Ontology PO, and software to edit them, such as Protégé and Phenoscape, are essential in the semantic web, but difficult to handle for biologist without computer skills. These ontologies constitute open worlds, and are expressed themselves by RDF triples (Resource Description Framework). Protégé offers vizualisation and reasoning capabilities for these ontologies (Gennari et al. 2003, Musen 2015). Our challenge is to combine the user friendliness of Xper3 with the expressive power of OWL (Web Ontology Language), the W3C standard for building ontologies. We therefore focused on analyzing the representation of the same taxonomic contents under Xper3 and under different models in OWL. After this critical analysis, we chose a description model that allows automatic export of SDD to OWL and can be easily enriched. We will present the results obtained and their validation on two knowledge bases, one on parasitic crustaceans (Sacculina) and the second on current ferns and fossils (Corvez and Grand 2014). The evolution of the Xper3 platform and the perspectives offered by this link with semantic web standards will be discussed.


Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


Author(s):  
Yongrui Chen ◽  
Huiying Li ◽  
Yuncheng Hua ◽  
Guilin Qi

Formal query building is an important part of complex question answering over knowledge bases. It aims to build correct executable queries for questions. Recent methods try to rank candidate queries generated by a state-transition strategy. However, this candidate generation strategy ignores the structure of queries, resulting in a considerable number of noisy queries. In this paper, we propose a new formal query building approach that consists of two stages. In the first stage, we predict the query structure of the question and leverage the structure to constrain the generation of the candidate queries. We propose a novel graph generation framework to handle the structure prediction task and design an encoder-decoder model to predict the argument of the predetermined operation in each generative step. In the second stage, we follow the previous methods to rank the candidate queries. The experimental results show that our formal query building approach outperforms existing methods on complex questions while staying competitive on simple questions.


2016 ◽  
Vol 31 (2) ◽  
pp. 97-123 ◽  
Author(s):  
Alfred Krzywicki ◽  
Wayne Wobcke ◽  
Michael Bain ◽  
John Calvo Martinez ◽  
Paul Compton

AbstractData mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we callknowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Alexander Falenski ◽  
Armin A. Weiser ◽  
Christian Thöns ◽  
Bernd Appel ◽  
Annemarie Käsbohrer ◽  
...  

In case of contamination in the food chain, fast action is required in order to reduce the numbers of affected people. In such situations, being able to predict the fate of agents in foods would help risk assessors and decision makers in assessing the potential effects of a specific contamination event and thus enable them to deduce the appropriate mitigation measures. One efficient strategy supporting this is using model based simulations. However, application in crisis situations requires ready-to-use and easy-to-adapt models to be available from the so-called food safety knowledge bases. Here, we illustrate this concept and its benefits by applying the modular open source software tools PMM-Lab and FoodProcess-Lab. As a fictitious sample scenario, an intentional ricin contamination at a beef salami production facility was modelled. Predictive models describing the inactivation of ricin were reviewed, relevant models were implemented with PMM-Lab, and simulations on residual toxin amounts in the final product were performed with FoodProcess-Lab. Due to the generic and modular modelling concept implemented in these tools, they can be applied to simulate virtually any food safety contamination scenario. Apart from the application in crisis situations, the food safety knowledge base concept will also be useful in food quality and safety investigations.


Author(s):  
Christopher Walton

In the introductory chapter of this book, we discussed the means by which knowledge can be made available on the Web. That is, the representation of the knowledge in a form by which it can be automatically processed by a computer. To recap, we identified two essential steps that were deemed necessary to achieve this task: 1. We discussed the need to agree on a suitable structure for the knowledge that we wish to represent. This is achieved through the construction of a semantic network, which defines the main concepts of the knowledge, and the relationships between these concepts. We presented an example network that contained the main concepts to differentiate between kinds of cameras. Our network is a conceptualization, or an abstract view of a small part of the world. A conceptualization is defined formally in an ontology, which is in essence a vocabulary for knowledge representation. 2. We discussed the construction of a knowledge base, which is a store of knowledge about a domain in machine-processable form; essentially a database of knowledge. A knowledge base is constructed through the classification of a body of information according to an ontology. The result will be a store of facts and rules that describe the domain. Our example described the classification of different camera features to form a knowledge base. The knowledge base is expressed formally in the language of the ontology over which it is defined. In this chapter we elaborate on these two steps to show how we can define ontologies and knowledge bases specifically for the Web. This will enable us to construct Semantic Web applications that make use of this knowledge. The chapter is devoted to a detailed explanation of the syntax and pragmatics of the RDF, RDFS, and OWL Semantic Web standards. The resource description framework (RDF) is an established standard for knowledge representation on the Web. Taken together with the associated RDF Schema (RDFS) standard, we have a language for representing simple ontologies and knowledge bases on the Web.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Christian Gerdesköld ◽  
Eva Toth-Pal ◽  
Inger Wårdh ◽  
Gunnar H. Nilsson ◽  
Anna Nager

Abstract Background Evidence-based information available at the point of care improves patient care outcomes. Online knowledge bases can increase the application of evidence-based medicine and influence patient outcome data which may be captured in quality registries. The aim of this study was to explore the effect of use of an online knowledge base on patient experiences and health care quality. Methods The study was conducted as a retrospective, observational study of 24 primary health care centers in Sweden exploring their use of an online knowledge base. Frequency of use was compared to patient outcomes in two national quality registries. A socio-economic Care Need Index was applied to assess whether the burden of care influenced the results from those quality registries. Non-parametric statistical methods and linear regression were used. Results Frequency of knowledge base use showed two groups: frequent and non-frequent users, with a significant use difference between the groups (p < 0.001). Outcome data showed significant higher values for all seven National Primary Care Patient Survey dimensions in the frequent compared to the non-frequent knowledge base users (p < 0.001), whereas 10 out of 11 parameters in the National Diabetes Register showed no differences between the groups (p > 0.05). Adjusting for Care Need Index had almost no effect on the outcomes for the groups. Conclusions Frequent users of a national online knowledge base received higher ratings on patient experiences, but figures on health care quality in diabetes showed near to no correlation. The findings indicate that some effects may be attributed to the use of knowledge bases and requires a controlled evaluation.


Sign in / Sign up

Export Citation Format

Share Document