Bayes-ReCCE

Author(s):  
Brian Walshe ◽  
Rob Brennan ◽  
Declan O'Sullivan

Linked Open Data consists of a large set of structured data knowledge bases which have been linked together, typically using equivalence statements. These equivalences usually take the form of owl:sameAs statements linking individuals, but links between classes are far less common. Often, the lack of linking between classes is because the relationships cannot be described as elementary one to one equivalences. Instead, complex correspondences referencing multiple entities in logical combinations are often necessary if we want to describe how the classes in one ontology are related to classes in a second ontology. In this paper the authors introduce a novel Bayesian Restriction Class Correspondence Estimation (Bayes-ReCCE) algorithm, an extensional approach to detecting complex correspondences between classes. Bayes-ReCCE operates by analysing features of matched individuals in the knowledge bases, and uses Bayesian inference to search for complex correspondences between the classes these individuals belong to. Bayes-ReCCE is designed to be capable of providing meaningful results even when only small amounts of matched instances are available. They demonstrate this capability empirically, showing that the complex correspondences generated by Bayes-ReCCE have a median F1 score of over 0.75 when compared against a gold standard set of complex correspondences between Linked Open Data knowledge bases covering the geographical and cinema domains. In addition, the authors discuss how metadata produced by Bayes-ReCCE can be included in the correspondences to encourage reuse by allowing users to make more informed decisions on the meaning of the relationship described in the correspondences.

Author(s):  
Lyubomir Penev ◽  
Teodor Georgiev ◽  
Viktor Senderov ◽  
Mariya Dimitrova ◽  
Pavel Stoev

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.


Author(s):  
Khayra Bencherif ◽  
Mimoun Malki ◽  
Djamel Amar Bensaber

This article describes how the Linked Open Data Cloud project allows data providers to publish structured data on the web according to the Linked Data principles. In this context, several link discovery frameworks have been developed for connecting entities contained in knowledge bases. In order to achieve a high effectiveness for the link discovery task, a suitable link configuration is required to specify the similarity conditions. Unfortunately, such configurations are specified manually; which makes the link discovery task tedious and more difficult for the users. In this article, the authors address this drawback by proposing a novel approach for the automatic determination of link specifications. The proposed approach is based on a neural network model to combine a set of existing metrics into a compound one. The authors evaluate the effectiveness of the proposed approach in three experiments using real data sets from the LOD Cloud. In addition, the proposed approach is compared against link specifications approaches to show that it outperforms them in most experiments.


Author(s):  
Brian Walshe ◽  
Rob Brennan ◽  
Declan O'Sullivan

Linked Data consists of many structured data knowledge bases that have been interlinked, often using equivalence statements. These equivalences usually take the form of owl:sameAs statements linking individuals, links between classes are far less common Often, the lack of class links is because their relationships cannot be described as one to one equivalences. Instead, complex correspondences referencing logical combinations of multiple entities are often needed to describe how the classes in an ontology are related to classes in a second ontology. This chapter introduces a novel Bayesian Restriction Class Correspondence Estimation (Bayes-ReCCE) algorithm, an extensional approach to detecting complex correspondences between classes. Bayes-ReCCE operates by analysing features of matched individuals in the knowledge bases, and uses Bayesian inference to search for complex correspondences between the classes these individuals belong to. Bayes-ReCCE is designed to be capable of providing meaningful results even when only small numbers of matched instances are available.


2018 ◽  
Vol 45 (6) ◽  
pp. 756-766 ◽  
Author(s):  
Gustavo Candela ◽  
Pilar Escobar ◽  
Rafael C Carrasco ◽  
Manuel Marco-Such

Cultural heritage institutions have recently begun to consider the benefits of sharing their collections using linked open data to disseminate and enrich their metadata. As datasets become very large, challenges appear, such as ingestion, management, querying and enrichment. Furthermore, each institution has particular features related to important aspects such as vocabularies and interoperability, which make it difficult to generalise this process and provide one-for-all solutions. In order to improve the user experience as regards information retrieval systems, researchers have identified that further refinements are required for the recognition and extraction of implicit relationships expressed in natural language. We introduce a framework for the enrichment and disambiguation of locations in text using open knowledge bases such as Wikidata and GeoNames. The framework has been successfully used to publish a dataset based on information from the Biblioteca Virtual Miguel de Cervantes, thus illustrating how semantic enrichment can help information retrieval. The methods applied in order to automate the enrichment process, which build upon open source software components, are described herein.


2020 ◽  
pp. 016555152093091
Author(s):  
José Luis Sánchez-Cervantes ◽  
Giner Alor-Hernández ◽  
Mario Andrés Paredes-Valverde ◽  
Lisbeth Rodríguez-Mazahua ◽  
Rafael Valencia-García

Mobile devices are the technological basis of computational intelligent systems, yet traditional mobile application interfaces tend to rely only on the touch modality. That said, such interfaces could improve human–computer interaction by combining diverse interaction modalities, such as visual, auditory and touch. Also, a lot of information on the Web is published under the Linked Data principles to allow people and computers to share, use and/or reuse high-quality information; however, current tools for searching for, browsing and visualising this kind of data are not fully developed. The goal of this research is to propose a novel architecture called NaLa-Search to effectively explore the Linked Open Data cloud. We present a mobile application that combines voice commands and touch for browsing and searching for such semantic information through faceted search, which is a widely used interaction scheme for exploratory search that is faithful to its richness and practical for real-world use. NaLa-Search was evaluated by real users from the clinical pharmacology domain. In this evaluation, the users had to search and navigate among the DrugBank dataset through voice commands. The evaluation results show that faceted search combined with multiple interaction modalities (e.g. speech and touch) can enhance users’ interaction with semantic knowledge bases.


1994 ◽  
Vol 33 (05) ◽  
pp. 454-463 ◽  
Author(s):  
A. M. van Ginneken ◽  
J. van der Lei ◽  
J. H. van Bemmel ◽  
P. W. Moorman

Abstract:Clinical narratives in patient records are usually recorded in free text, limiting the use of this information for research, quality assessment, and decision support. This study focuses on the capture of clinical narratives in a structured format by supporting physicians with structured data entry (SDE). We analyzed and made explicit which requirements SDE should meet to be acceptable for the physician on the one hand, and generate unambiguous patient data on the other. Starting from these requirements, we found that in order to support SDE, the knowledge on which it is based needs to be made explicit: we refer to this knowledge as descriptional knowledge. We articulate the nature of this knowledge, and propose a model in which it can be formally represented. The model allows the construction of specific knowledge bases, each representing the knowledge needed to support SDE within a circumscribed domain. Data entry is made possible through a general entry program, of which the behavior is determined by a combination of user input and the content of the applicable domain knowledge base. We clarify how descriptional knowledge is represented, modeled, and used for data entry to achieve SDE, which meets the proposed requirements.


Sign in / Sign up

Export Citation Format

Share Document