scholarly journals The 8 Years of Existence of Xper3: State of the art and future developments of the platform

Author(s):  
Adeline Kerner ◽  
Sylvain Bouquin ◽  
Rémy Portier ◽  
Régine Vignes Lebbe

The Xper3 platform was launched in November 2013 (Saucède et al. 2020). Xper3 is a free web platform that manages descriptive data and provides interactive identification keys. It is a follow-up to Xper (Forget et al. 1986) and Xper2 (Ung et al. 2010). Xper3 is used via web browsers. It offers a collaborative, multi-user interface without local installation. It is compatible with TDWG’s Structured Descriptive Data (SDD) format. Xper3 and its previous version, Xper2, have already been used for various taxonomic groups. In June 2021, 4743 users had created accounts and edited 5756 knowledge bases. Each knowledge base is autonomous and can be published as a free access key link, as a data paper in publications or on websites. The risk of this autonomy and lack of visibility to already existing knowlege bases is possible duplicated content or overlapping effort. Increasingly, users have asked for a public overview of the existing content. A first version of a searching tool is now available online. Explorer lists the databases whose creators have filled in the extended metadata and have accepted the referencing. The user can search by language, taxonomic group, fossil or current, geography, habitat, and key words. New developments of Xper3 are in progress. Some have a first version online, others are in production and the last ones are future projects. We will present an overview of the different projects in progress and for the future. Calculated descriptors are a distinctive feature of Xper3 (Kerner and Vignes Lebbe 2019). These descriptors are automatically computed from other descriptors by using logical operators (Boolean operators). The use of calculated descriptors remains rare. It is necessary to put forward the calculated descriptors to encourage more feedback in order to improve them. The link between Xper3 and Annotate continues to improve (Hays and Kerner 2020). Annotate offers the possibility of tagging images with controlled vocabularies structured in Xper3. Then, an export from Annotate to Xper3, allows automatic filling in of the Xper3 knowledge base with the descriptions (annotations and numerical measures) of virtual specimens, and then comparing specimens to construct species descriptions, etc. Future developments are in progress that will modify the Xper3 architecture in order to have the same functionalities in both local and online versions and to allow various user interfaces from the same knowledge bases. Xper2-specific features, such as merging states, adding notes, adding definitions and/or illustrations in the description tab, having different ways of sorting and filtering the descriptors during an identification (by groups, identification power, alphabetic order, specialist’s choice) have to be added to Xper3. A new tab in Xper3’s interface is being implemented to give an access to various analysis tools, via API (Application Programming Interface), or R programming code: MINSET: minimum list of descriptors sufficient to discriminate all items MINDESCR: minimum set of descriptors to discriminate an item DESCRXP: generating a description in natural language MERGEMOD: proposing to merge states without loss of discriminating power DISTINXP, DISTVAXP: computing similarities between items or descriptors MINSET: minimum list of descriptors sufficient to discriminate all items MINDESCR: minimum set of descriptors to discriminate an item DESCRXP: generating a description in natural language MERGEMOD: proposing to merge states without loss of discriminating power DISTINXP, DISTVAXP: computing similarities between items or descriptors One last project that we would like to implement is an interoperability between Xper3, platforms with biodiversity data (e.g., Global Biodiversity Information Facility, GBIF) and bio-ontologies. An ID field already exists to add Universally Unique IDentifiers (UUID) for taxa. ID fields have to be added for descriptors and states to link them with ontologies e.g., Phenotypic Quality Ontology PATO, Plant Ontology PO. We are interested in discussing future developments to further improve the user interface and develop new tools for the analysis of knowledge bases.

2020 ◽  
Vol 12 (3) ◽  
pp. 45
Author(s):  
Wenqing Wu ◽  
Zhenfang Zhu ◽  
Qiang Lu ◽  
Dianyuan Zhang ◽  
Qiangqiang Guo

Knowledge base question answering (KBQA) aims to analyze the semantics of natural language questions and return accurate answers from the knowledge base (KB). More and more studies have applied knowledge bases to question answering systems, and when using a KB to answer a natural language question, there are some words that imply the tense (e.g., original and previous) and play a limiting role in questions. However, most existing methods for KBQA cannot model a question with implicit temporal constraints. In this work, we propose a model based on a bidirectional attentive memory network, which obtains the temporal information in the question through attention mechanisms and external knowledge. Specifically, we encode the external knowledge as vectors, and use additive attention between the question and external knowledge to obtain the temporal information, then further enhance the question vector to increase the accuracy. On the WebQuestions benchmark, our method not only performs better with the overall data, but also has excellent performance regarding questions with implicit temporal constraints, which are separate from the overall data. As we use attention mechanisms, our method also offers better interpretability.


Doklady BGUIR ◽  
2020 ◽  
Vol 18 (5) ◽  
pp. 44-52
Author(s):  
Li Wenzu

This article proposes an approach for designing a general subsystem of automatic generation of questions in intelligent learning systems. The designed subsystem allows various types of questions to be automatically generated based on information from the knowledge bases and save the generated questions in the subsystem knowledge base for future use. The main part of the subsystem is the automatic generation module of questions, which allows one to generate questions of various types based on existing question generation strategies in combination with the structural characteristics of knowledge bases built using OSTIS technology. In this article, a variety of strategies for automatically generated questions are proposed, the use of which allows various types of questions to be automatically generated, such as multiple-choice questions, fill-in-the-blank questions, questions of definition interpretation and etc. The most important part of the subsystem is the knowledge base, which stores the ontology of questions, including the question instances themselves. In this article, the knowledge base is constructed based on OSTIS technical standards. The type classification of automatically generated questions was developed, as well as the subject area for storing generated questions and the corresponding ontology described in the knowledge base of the subsystem. The generated questions are stored in the subsystem knowledge base in the form of SC-code, which is the OSTIS technology standard. When testing users, these automatically generated questions are converted to the corresponding natural language form through the natural language interface. Compared with the existing approaches, the approach proposed in this article has certain advantages, and the subsystem designed using this approach can be used in various OSTISbased systems driven by OSTIS technology.


2007 ◽  
pp. 86-113 ◽  
Author(s):  
Son B. Pham ◽  
Achim Hoffmann

In this chapter we discuss ways of assisting experts to develop complex knowledge bases for a variety of natural language processing tasks. The proposed techniques are embedded into an existing knowledge acquisition framework, KAFTIE, specifically designed for building knowledge bases for natural language processing. Our intelligent agent, the rule suggestion module within KAFTIE, assists the expert by suggesting new rules in order to address incorrect behavior of the current knowledge base. The suggested rules are based on previously entered rules which were “hand-crafted” by the expert. Initial experiments with the new rule suggestion module are very encouraging as they resulted in a more compact knowledge base of comparable quality to a fully hand-crafted knowledge base. At the same time the development time for the more compact knowledge base was considerably reduced.


2019 ◽  
Vol 9 (1) ◽  
pp. 88-106
Author(s):  
Irphan Ali ◽  
Divakar Yadav ◽  
Ashok Kumar Sharma

A question answering system aims to provide the correct and quick answer to users' query from a knowledge base. Due to the growth of digital information on the web, information retrieval system is the need of the day. Most recent question answering systems consult knowledge bases to answer a question, after parsing and transforming natural language queries to knowledge base-executable forms. In this article, the authors propose a semantic web-based approach for question answering system that uses natural language processing for analysis and understanding the user query. It employs a “Total Answer Relevance Score” to find the relevance of each answer returned by the system. The results obtained thereof are quite promising. The real-time performance of the system has been evaluated on the answers, extracted from the knowledge base.


Author(s):  
Kangqi Luo ◽  
Xusheng Luo ◽  
Xianyang Chen ◽  
Kenny Q. Zhu

This paper studies the problem of discovering the structured knowledge representation of binary natural language relations.The representation, known as the schema, generalizes the traditional path of predicates to support more complex semantics.We present a search algorithm to generate schemas over a knowledge base, and propose a data-driven learning approach to discover the most suitable representations to one relation. Evaluation results show that inferred schemas are able to represent precise semantics, and can be used to enrich manually crafted knowledge bases.


Doklady BGUIR ◽  
2020 ◽  
Vol 18 (6) ◽  
pp. 49-56
Author(s):  
Q. Longwei

To implement natural language user interface and an intelligent answer to questions, the knowledgebased semantic model for Chinese language processing is proposed. The article gives careful consideration to the existing methods and various knowledge bases for natural language processing. The analysis of these methods has led to the conclusion that in natural language processing, the knowledge base is the most fundamental and crucial part. The knowledge base makes it possible to ensure processing of a natural language based on initially described knowledge and to explain the processing operations. By virtue of the analysis of various methods for constructing knowledge bases about the English and Chinese languages, an ontological approach to the Chinese language processing was proposed. The Chinese language processing model has two main aspects: the design of knowledge base about the Chinese language and the development of ontology-based knowledge processing machine. The proposed approach is aimed at developing a semantic model of knowledge on the Chinese language. As a stage in the implementation of the approach, I designed the ontology of the Chinese language that can be applied for further processing of the language. This paper considers the preliminary version of the ontology and the principle of building a knowledge base about the Chinese language. There are no uniform standards and evaluation system for designing an ontology. Expansion, refinement and evaluation of the ontology require further research.


1995 ◽  
Vol 34 (01/02) ◽  
pp. 165-171 ◽  
Author(s):  
J. F. Sowa

Abstract:Knowledge-base design requires a thorough analysis of the concepts to be represented. If the analysis is incomplete or inaccurate, the resulting knowledge base may contain arbitrary restrictions, inconsistent data, or limitations that make future extensions impossible. Conceptual analysis is even more critical for sharing knowledge bases between heterogeneous systems. Analyses that are adequate for independent systems may cause incompatibilities when the systems share data. This paper discusses the kinds of problems that require a careful conceptual analysis and the difficulties caused by an incorrect or incomplete analysis. It also shows how independently developed knowledge bases can be related to one another by reanalyzing and redefining the basic concepts and relations. Such analysis is essential for sharing knowledge between systems developed for different purposes, such as relational and object-oriented databases or expert systems and natural language processors.


Author(s):  
Adeline Kerner ◽  
Régine Vignes Lebbe

Natural sciences need to make assertions about characteristics of taxa. Traits and qualities or descriptors and states become increasingly crucial as a resource for identification adapted to both scientists and the public. Specialists, non-specialists, and the general public need different strategies for accessing the information. Creating a knowledge base is time-consuming and adapting this base to several needs seems to increase the required time substantially. Specialists think that an identification tool requires a complete overhaul when they want to change the target audience or the language... so they do not wish to get involved. How to minimize creation and update time? How to aggregate data only once in order to create a single knowledge base with different access levels? Strategies for integrating different patterns of descriptors in a single knowledge base are useful to modulate descriptions without loss of information and with the certainty that everything is up to date in each context. This multi-context knowledge base, derived from a single trait dataset, can generate descriptions adapted to different contexts and users. To address this issue, we propose to use calculated descriptors in order to have on the same knowledge base, different versions of descriptors that are updated automatically when the reference trait is modified. Calculated descriptors are a distinctive feature of Xper3. These descriptors are automatically computed from other descriptors by using logical operators (boolean operators). Xper3 (http://www.xper3.com) is a web platform that manages descriptive data and provides interactive identification keys. Xper3 and its previous version, Xper2, have been used already for various taxonomic groups. We will focus on fossils in order to reveal how calculated descriptors in Xper3 knowledge bases can solve the multi-context problem. The main source of content is the archaeocyaths knowledge base (http://archaeocyatha.infosyslab.fr). Archaeocyaths are the first animals to build reefs during the Cambrian. They are calcified sponges without spicules. The archaeocyaths knowledge base is an efficient resource for scientific studies and a useful tool for non-specialists, especially with the support of calculated descriptors. Correspondence between archaeocyath and sponge morphologies is not ready yet, but everything will be included into PORO (The Porifera Ontology, http://purl.obolibrary.org/obo/poro/releases/2014-03-06/) in the short term (an anatomy ontology about sponges). In this knowledge base, calculated descriptors are used to: create a consistent multilingual interactive identification key (French and English are available and Russian is in draft), generate descriptors adapted to different level of expertise and reword morphological descriptors (adapted for identification) into homologous characters (adapted for phylogeny). create a consistent multilingual interactive identification key (French and English are available and Russian is in draft), generate descriptors adapted to different level of expertise and reword morphological descriptors (adapted for identification) into homologous characters (adapted for phylogeny). Xper2 and Xper3 are compatible with TDWG’s Structured Descriptive Data (SDD) format. Calculated descriptors do not exist in SDD format and so they are exported from Xper3 as categorical descriptors, therefore losing the origin of values. Calculated descriptors are powerful and we are interested in discussing these with SDD and Xper3 users in order to improve the user interface and develop new tools for the analysis of such descriptors.


1998 ◽  
Vol 37 (04/05) ◽  
pp. 327-333 ◽  
Author(s):  
F. Buekens ◽  
G. De Moor ◽  
A. Waagmeester ◽  
W. Ceusters

AbstractNatural language understanding systems have to exploit various kinds of knowledge in order to represent the meaning behind texts. Getting this knowledge in place is often such a huge enterprise that it is tempting to look for systems that can discover such knowledge automatically. We describe how the distinction between conceptual and linguistic semantics may assist in reaching this objective, provided that distinguishing between them is not done too rigorously. We present several examples to support this view and argue that in a multilingual environment, linguistic ontologies should be designed as interfaces between domain conceptualizations and linguistic knowledge bases.


2020 ◽  
Author(s):  
Matheus Pereira Lobo

This paper is about highlighting two categories of knowledge bases, one built as a repository of links, and other based on units of knowledge.


Sign in / Sign up

Export Citation Format

Share Document