Integrating corpus data in dynamic knowledge bases

Terminology ◽  
2008 ◽  
Vol 14 (2) ◽  
pp. 159-182 ◽  
Author(s):  
Maribel Tercedor ◽  
Clara I. López Rodríguez

Terminological information is a key element in the construction of a knowledge base. In order for a knowledge base to be useful to different users, terminological information should be extracted from corpora so as to reflect the different pragmatic nuances. Puertoterm is a knowledge base in the field of Coastal Engineering, which has made use of corpus information to develop terminological entries. It also includes contextual information in such a way that this information interacts with other elements of the knowledge base. We describe the methodology followed in the project regarding corpus design, retrieval of lexical information, conceptual organization of the domain of Coastal Engineering, and the elaboration of terminological entries.

2009 ◽  
Vol 50 (4) ◽  
Author(s):  
Pamela Faber Benítez ◽  
Carlos Márquez Linares ◽  
Miguel Vega Expósito

Abstract The frame notion used in Frame Semantics can be traced to case frames, which were said to characterize a small abstract situation in such a way that if one wished to understand the semantic structure of a verb it was necessary to understand the properties of the entire scene that it activated. A frame has been more broadly defined as any system of concepts related in such a way that one concept evokes the entire system. In this sense, it bears an obvious affinity with terminology, which is also based on such conceptual organization. However, despite the fact that Frame Semantics has been usefully applied to lexicology and syntax, so far it has not been systematically applied to terminology. This paper argues for a frame-based organization of specialized fields in which a dynamic process-oriented frame provides the conceptual underpinnings for the location of sub-hierarchies of concepts within a specialized domain event, and the elaboration of a definition template, thus opening the door to a more adequate representation of specialized fields as well as supplying a better way of linking terms to concepts. The domain of coastal engineering is used as an example because the entities in play take part in processes that are difficult to describe only by means of conceptual trees. Through the use of corpus data we demonstrate how it is possible to represent such an event and create a dynamic frame which enriches and enhances the understanding of specialized field concepts.


2020 ◽  
Author(s):  
Matheus Pereira Lobo

This paper is about highlighting two categories of knowledge bases, one built as a repository of links, and other based on units of knowledge.


2018 ◽  
Vol 2 ◽  
pp. e25614 ◽  
Author(s):  
Florian Pellen ◽  
Sylvain Bouquin ◽  
Isabelle Mougenot ◽  
Régine Vignes-Lebbe

Xper3 (Vignes Lebbe et al. 2016) is a collaborative knowledge base publishing platform that, since its launch in november 2013, has been adopted by over 2 thousand users (Pinel et al. 2017). This is mainly due to its user friendly interface and the simplicity of its data model. The data are stored in MySQL Relational DBs, but the exchange format uses the TDWG standard format SDD (Structured Descriptive DataHagedorn et al. 2005). However, each Xper3 knowledge base is a closed world that the author(s) may or may not share with the scientific community or the public via publishing content and/or identification key (Kopfstein 2016). The explicit taxonomic, geographic and phenotypic limits of a knowledge base are not always well defined in the metadata fields. Conversely terminology vocabularies, such as Phenotype and Trait Ontology PATO and the Plant Ontology PO, and software to edit them, such as Protégé and Phenoscape, are essential in the semantic web, but difficult to handle for biologist without computer skills. These ontologies constitute open worlds, and are expressed themselves by RDF triples (Resource Description Framework). Protégé offers vizualisation and reasoning capabilities for these ontologies (Gennari et al. 2003, Musen 2015). Our challenge is to combine the user friendliness of Xper3 with the expressive power of OWL (Web Ontology Language), the W3C standard for building ontologies. We therefore focused on analyzing the representation of the same taxonomic contents under Xper3 and under different models in OWL. After this critical analysis, we chose a description model that allows automatic export of SDD to OWL and can be easily enriched. We will present the results obtained and their validation on two knowledge bases, one on parasitic crustaceans (Sacculina) and the second on current ferns and fossils (Corvez and Grand 2014). The evolution of the Xper3 platform and the perspectives offered by this link with semantic web standards will be discussed.


Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


Author(s):  
Yongrui Chen ◽  
Huiying Li ◽  
Yuncheng Hua ◽  
Guilin Qi

Formal query building is an important part of complex question answering over knowledge bases. It aims to build correct executable queries for questions. Recent methods try to rank candidate queries generated by a state-transition strategy. However, this candidate generation strategy ignores the structure of queries, resulting in a considerable number of noisy queries. In this paper, we propose a new formal query building approach that consists of two stages. In the first stage, we predict the query structure of the question and leverage the structure to constrain the generation of the candidate queries. We propose a novel graph generation framework to handle the structure prediction task and design an encoder-decoder model to predict the argument of the predetermined operation in each generative step. In the second stage, we follow the previous methods to rank the candidate queries. The experimental results show that our formal query building approach outperforms existing methods on complex questions while staying competitive on simple questions.


2016 ◽  
Vol 31 (2) ◽  
pp. 97-123 ◽  
Author(s):  
Alfred Krzywicki ◽  
Wayne Wobcke ◽  
Michael Bain ◽  
John Calvo Martinez ◽  
Paul Compton

AbstractData mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we callknowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.


2010 ◽  
Vol 42 (7) ◽  
pp. 1928-1950 ◽  
Author(s):  
Arianne Reimerink ◽  
Mercedes García de Quesada ◽  
Silvia Montero-Martínez

2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Alexander Falenski ◽  
Armin A. Weiser ◽  
Christian Thöns ◽  
Bernd Appel ◽  
Annemarie Käsbohrer ◽  
...  

In case of contamination in the food chain, fast action is required in order to reduce the numbers of affected people. In such situations, being able to predict the fate of agents in foods would help risk assessors and decision makers in assessing the potential effects of a specific contamination event and thus enable them to deduce the appropriate mitigation measures. One efficient strategy supporting this is using model based simulations. However, application in crisis situations requires ready-to-use and easy-to-adapt models to be available from the so-called food safety knowledge bases. Here, we illustrate this concept and its benefits by applying the modular open source software tools PMM-Lab and FoodProcess-Lab. As a fictitious sample scenario, an intentional ricin contamination at a beef salami production facility was modelled. Predictive models describing the inactivation of ricin were reviewed, relevant models were implemented with PMM-Lab, and simulations on residual toxin amounts in the final product were performed with FoodProcess-Lab. Due to the generic and modular modelling concept implemented in these tools, they can be applied to simulate virtually any food safety contamination scenario. Apart from the application in crisis situations, the food safety knowledge base concept will also be useful in food quality and safety investigations.


Author(s):  
Christopher Walton

In the introductory chapter of this book, we discussed the means by which knowledge can be made available on the Web. That is, the representation of the knowledge in a form by which it can be automatically processed by a computer. To recap, we identified two essential steps that were deemed necessary to achieve this task: 1. We discussed the need to agree on a suitable structure for the knowledge that we wish to represent. This is achieved through the construction of a semantic network, which defines the main concepts of the knowledge, and the relationships between these concepts. We presented an example network that contained the main concepts to differentiate between kinds of cameras. Our network is a conceptualization, or an abstract view of a small part of the world. A conceptualization is defined formally in an ontology, which is in essence a vocabulary for knowledge representation. 2. We discussed the construction of a knowledge base, which is a store of knowledge about a domain in machine-processable form; essentially a database of knowledge. A knowledge base is constructed through the classification of a body of information according to an ontology. The result will be a store of facts and rules that describe the domain. Our example described the classification of different camera features to form a knowledge base. The knowledge base is expressed formally in the language of the ontology over which it is defined. In this chapter we elaborate on these two steps to show how we can define ontologies and knowledge bases specifically for the Web. This will enable us to construct Semantic Web applications that make use of this knowledge. The chapter is devoted to a detailed explanation of the syntax and pragmatics of the RDF, RDFS, and OWL Semantic Web standards. The resource description framework (RDF) is an established standard for knowledge representation on the Web. Taken together with the associated RDF Schema (RDFS) standard, we have a language for representing simple ontologies and knowledge bases on the Web.


Sign in / Sign up

Export Citation Format

Share Document