scholarly journals Assisting Biologists in Editing Taxonomic Information by Confronting Multiple Data Sources using Linked Data Standards

Author(s):  
Franck Michel ◽  
Catherine Faron-Zucker ◽  
Sandrine Tercerie ◽  
Antonia Ettorre ◽  
Gargominy Olivier

During the last decade, Web APIs (Application Programming Interface) have gained significant traction to the extent that they have become a de-facto standard to enable HTTP-based, machine-processable data access. Despite this success, however, they still often fail in making data interoperable, insofar as they commonly rely on proprietary data models and vocabularies that lack formal semantic descriptions essential to ensure reliable data integration. In the biodiversity domain, multiple data aggregators, such as the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EoL), maintain specialized Web APIs giving access to billions of records about taxonomies, occurrences, or life traits (Triebel et al. 2012). They publish data sets spanning complementary and often overlapping regions, epochs or domains, but may also report or rely on potentially conflicting perspectives, e.g. with respect to the circumscription of taxonomic concepts. It is therefore of utmost importance for biologists and collection curators to be able to confront the knowledge they have about taxa with related data coming from third-party data sources. To tackle this issue, the French National Museum of Natural History (MNHN) has developed an application to edit TAXREF, the French taxonomic register for fauna, flora and fungus (Gargominy et al. 2018). TAXREF registers all species recorded in metropolitan France and overseas territories, accounting for 260,000+ biological taxa (200,000+ species) along with 570,000+ scientific names. The TAXREF-Web application compares data available in TAXREF with corresponding data from third-party data sources, points out disagreements and allows biologists to add, remove or amend TAXREF accordingly. This requires that TAXREF-Web developers write a specific piece of code for each considered Web API to align TAXREF representation with the Web API counterpart. This task is time-consuming and makes maintenance of the web application cumbersome. In this presentation, we report on a new implementation of TAXREF-Web that harnesses the Linked Data standards: Resource Description Framework (RDF), the Semantic Web format to represent knowledge graphs, and SPARQL, the W3C standard to query RDF graphs. In addition, we leverage the SPARQL Micro-Service architecture (Michel et al. 2018), a lightweight approach to query Web APIs using SPARQL. A SPARQL micro-service is a SPARQL endpoint that wraps a Web API service; it typically produces a small, resource-centric RDF graph by invoking the Web API and transforming the response into RDF triples. We developed SPARQL micro-services to wrap the Web APIs of GBIF, World Register of Marine Species (WoRMS), FishBase, Index Fungorum, Pan-European Species directories Infrastructure (PESI), ZooBank, International Plant Names Index (IPNI), EoL, Tropicos and Sandre. These micro-services consistently translate Web APIs responses into RDF graphs utilizing mainly two well-adopted vocabularies: Schema.org (Guha et al. 2015) and Darwin Core (Baskauf et al. 2015). This approach brings about two major advantages. First, the large adoption of Schema.org and Darwin Core ensures that the services can be immediately understood and reused by a large audience within the biodiversity community. Second, wrapping all these Web APIs in SPARQL micro-services “suddenly” makes them technically and semantically interoperable, since they all represent resources (taxa, habitats, traits, etc.) in a common manner. Consequently, the integration task is simplified: confronting data from multiple sources essentially consists of writing the appropriate SPARQL queries, thus making easier web application development and maintenance. We present several concrete cases in which we use this approach to detect disagreements between TAXREF and the aforementioned data sources, with respect to taxonomic information (author, synonymy, vernacular names, classification, taxonomic rank), habitats, bibliographic references, species interactions and life traits.

2016 ◽  
Vol 28 (2) ◽  
pp. 241-251 ◽  
Author(s):  
Luciane Lena Pessanha Monteiro ◽  
Mark Douglas de Azevedo Jacyntho

The study addresses the use of the Semantic Web and Linked Data principles proposed by the World Wide Web Consortium for the development of Web application for semantic management of scanned documents. The main goal is to record scanned documents describing them in a way the machine is able to understand and process them, filtering content and assisting us in searching for such documents when a decision-making process is in course. To this end, machine-understandable metadata, created through the use of reference Linked Data ontologies, are associated to documents, creating a knowledge base. To further enrich the process, (semi)automatic mashup of these metadata with data from the new Web of Linked Data is carried out, considerably increasing the scope of the knowledge base and enabling to extract new data related to the content of stored documents from the Web and combine them, without the user making any effort or perceiving the complexity of the whole process.


Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


2021 ◽  
Vol 81 (3-4) ◽  
pp. 318-358
Author(s):  
Sander Stolk

Abstract This article provides an introduction to the web application Evoke. This application offers functionality to navigate, view, extend, and analyse thesaurus content. The thesauri that can be navigated in Evoke are expressed in Linguistic Linked Data, an interoperable data form that enables the extension of thesaurus content with custom labels and allows for the linking of thesaurus content to other digital resources. As such, Evoke is a powerful research tool that facilitates its users to perform novel cultural linguistic analyses over multiple sources. This article further demonstrates the potential of Evoke by discussing how A Thesaurus of Old English was made available in the application and how this has already been adopted in the field of Old English studies. Lastly, the author situates Evoke within a number of recent developments in the field of Digital Humanities and its applications for onomasiological research.


2018 ◽  
Vol 52 (3) ◽  
pp. 405-423 ◽  
Author(s):  
Riccardo Albertoni ◽  
Monica De Martino ◽  
Paola Podestà

Purpose The purpose of this paper is to focus on the quality of the connections (linkset) among thesauri published as Linked Data on the Web. It extends the cross-walking measures with two new measures able to evaluate the enrichment brought by the information reached through the linkset (lexical enrichment, browsing space enrichment). It fosters the adoption of cross-walking linkset quality measures besides the well-known and deployed cardinality-based measures (linkset cardinality and linkset coverage). Design/methodology/approach The paper applies the linkset measures to the Linked Thesaurus fRamework for Environment (LusTRE). LusTRE is selected as testbed as it is encoded using a Simple Knowledge Organisation System (SKOS) published as Linked Data, and it explicitly exploits the cross-walking measures on its validated linksets. Findings The application on LusTRE offers an insight of the complementarities among the considered linkset measures. In particular, it shows that the cross-walking measures deepen the cardinality-based measures analysing quality facets that were not previously considered. The actual value of LusTRE’s linksets regarding the improvement of multilingualism and concept spaces is assessed. Research limitations/implications The paper considers skos:exactMatch linksets, which belong to a rather specific but a quite common kind of linkset. The cross-walking measures explicitly assume correctness and completeness of linksets. Third party approaches and tools can help to meet the above assumptions. Originality/value This paper fulfils an identified need to study the quality of linksets. Several approaches formalise and evaluate Linked Data quality focusing on data set quality but disregarding the other essential component: the connection among data.


2021 ◽  
Vol 4 (1) ◽  
pp. 26-42
Author(s):  
Khasan Asrori ◽  
Ely Nuryani

Along with the development of the technology field at the company PT. Barata Indonesia, needs support to assist organizational activities, such as in ordering meeting rooms. Currently, information of meeting rooms availability and ordering at PT. Barata Indonesia still hasn't used technology so the process of ordering a meeting room is by contacting the room admin to ask about the availability of the place to be booked. This is less effective because the customer cannot know directly which room can be used for meetings and according to the capacity of the person. Therefore, this application is made to facilitate ordering meeting rooms at the company. The development of this meeting room reservation information system uses the SDLC Waterfall method which aims to simplify and speed up information access. The development of this Meeting Room Reservation Information System uses the CodeIgniter 3 Framework, which has 2 (two) interfaces, namely FrondEnd, which is the start page of the web application that is displayed for visitors and BackEnd is the admin page to process the required information data sources. The results of the system trial show that the application of the Meeting Room Booking Information System can provide more flexible information.


Author(s):  
B. Margan ◽  
F. Hakimpour

Abstract. Linked Data is available data on the web in a standard format that is useful for content inspection and insights deriving from data through semantic queries. Querying and Exploring spatial and temporal features of various data sources will be facilitated by using Linked Data. In this paper, an application is presented for linking transport data on the web. Data from Google Maps API and OpenStreetMap linked and published on the web. Spatio-Temporal queries were executed over linked transport data and resulted in network and traffic information in accordance with the user’s position. The client-side of this application contains a web and a mobile application which presents a user interface to access network and traffic information according to the user’s position. The results of the experiment show that by using the intrinsic potential of Linked Data we have tackled the challenges of using heterogeneous data sources and have provided desirable information that could be used for discovering new patterns. The mobile GIS application enables assessing the profits of mentioned technologies through an easy and user-friendly way.


2020 ◽  
Vol 1 (2) ◽  
pp. 72-85
Author(s):  
Angelica Lo Duca ◽  
Andrea Marchetti

Within the field of Digital Humanities, a great effort has been made to digitize documents and collections in order to build catalogs and exhibitions on the Web. In this paper, we present WeME, a Web application for building a knowledge base, which can be used to describe digital documents. WeME can be used by different categories of users: archivists/librarians and scholars. WeME extracts information from some well-known Linked Data nodes, i.e. DBpedia and GeoNames, as well as traditional Web sources, i.e. VIAF. As a use case of WeME, we describe the knowledge base related to the Christopher Clavius’s corre spondence. Clavius was a mathematician and an astronomer of the XVI Century. He wrote more than 300 letters, most of which are owned by the Historical Archives of the Pontifical Gregorian University (APUG) in Rome. The built knowledge base contains 139 links to DBpedia, 83 links to GeoNames and 129 links to VIAF. In order to test the usability of WeME, we invited 26 users to test the application.


Author(s):  
Lehireche Nesrine ◽  
Malki Mimoun ◽  
Lehireche Ahmed ◽  
Reda Mohamed Hamou

The purpose of the semantic web goes well beyond a simple provision of raw data: it is a matter of linking data together. This data meshing approach, called linked data (LD), refers to a set of best practices for publishing and interlinking data on the web. Due to its principles, a new context appeared called linked enterprise data (LED). The LED is the application of linked data to the information system of the enterprise to answer all the challenge of an IS, in order to have an agile and performing System. Where internal data sources link to external data, with easy access to information in performing time. This article focuses on using the LED to support the challenges of database integration and state-of-the-art for mapping RDB to RDF based on LD. Then, the authors introduce a proposition for on demand extract transform load (ETL) of RDB to RDF mapping using algorithms. Finally, the authors present a conclusion and discussion for their perspectives to implement the solution.


2015 ◽  
Vol 23 (1) ◽  
pp. 73-101 ◽  
Author(s):  
Eugene Ferry ◽  
John O Raw ◽  
Kevin Curran

Purpose – The interoperability of cloud data between web applications and mobile devices has vastly improved over recent years. The popularity of social media, smartphones and cloud-based web services have contributed to the level of integration that can be achieved between applications. This paper investigates the potential security issues of OAuth, an authorisation framework for granting third-party applications revocable access to user data. OAuth has rapidly become an interim de facto standard for protecting access to web API data. Vendors have implemented OAuth before the open standard was officially published. To evaluate whether the OAuth 2.0 specification is truly ready for industry application, an entire OAuth client server environment was developed and validated against the speciation threat model. The research also included the analysis of the security features of several popular OAuth integrated websites and comparing those to the threat model. High-impacting exploits leading to account hijacking were identified with a number of major online publications. It is hypothesised that the OAuth 2.0 specification can be a secure authorisation mechanism when implemented correctly. Design/methodology/approach – To analyse the security of OAuth implementations in industry a list of the 50 most popular websites in Ireland was retrieved from the statistical website Alexa (Noureddine and Bashroush, 2011). Each site was analysed to identify if it utilised OAuth. Out of the 50 sites, 21 were identified with OAuth support. Each vulnerability in the threat model was then tested against each OAuth-enabled site. To test the robustness of the OAuth framework, an entire OAuth environment was required. The proposed solution would compose of three parts: a client application, an authorisation server and a resource server. The client application needed to consume OAuth-enabled services. The authorisation server had to manage access to the resource server. The resource server had to expose data from the database based on the authorisation the user would be given from the authorisation server. It was decided that the client application would consume emails from Google’s Gmail API. The authorisation and resource server were modelled around a basic task-tracking web application. The client application would also consume task data from the developed resource server. The client application would also support Single Sign On for Google and Facebook, as well as a developed identity provider “MyTasks”. The authorisation server delegated authorisation to the client application and stored cryptography information for each access grant. The resource server validated the supplied access token via public cryptography and returned the requested data. Findings – Two sites out of the 21 were found to be susceptible to some form of attack, meaning that 10.5 per cent were vulnerable. In total, 18 per cent of the world’s 50 most popular sites were in the list of 21 OAuth-enabled sites. The OAuth 2.0 specification is still very much in its infancy, but when implemented correctly, it can provide a relatively secure and interoperable authentication delegation mechanism. The IETF are currently addressing issues and expansions in their working drafts. Once a strict level of conformity is achieved between vendors and vulnerabilities are mitigated, it is likely that the framework will change the way we access data on the web and other devices. Originality/value – OAuth is flexible, in that it offers extensions to support varying situations and existing technologies. A disadvantage of this flexibility is that new extensions typically bring new security exploits. Members of the IETF OAuth Working Group are constantly refining the draft specifications and are identifying new threats to the expanding functionality. OAuth provides a flexible authentication mechanism to protect and delegate access to APIs. It solves the password re-use across multiple accounts problem and stops the user from having to disclose their credentials to third parties. Filtering access to information by scope and giving the user the option to revoke access at any point gives the user control of their data. OAuth does raise security concerns, such as defying phishing education, but there are always going to be security issues with any authentication technology. Although several high impacting vulnerabilities were identified in industry, the developed solution proves the predicted hypothesis that a secure OAuth environment can be built when implemented correctly. Developers must conform to the defined specification and are responsible for validating their implementation against the given threat model. OAuth is an evolving authorisation framework. It is still in its infancy, and much work needs to be done in the specification to achieve stricter validation and vendor conformity. Vendor implementations need to become better aligned in order to provider a rich and truly interoperable authorisation mechanism. Once these issues are resolved, OAuth will be on track for becoming the definitive authentication standard on the web.


2021 ◽  
Vol 81 (3-4) ◽  
pp. 359-383
Author(s):  
Thijs Porck

Abstract This article discusses proof-of-concept research into the structure of the vocabularies of three Old English texts, Beowulf, Andreas and the Old English Martyrology. With the help of the Web application Evoke, which makes A Thesaurus of Old English (TOE) available in Linguistic Linked Data form, the words that occur in these three texts have been tagged within the existing onomasiological structure of TOE. This tagging process has resulted in prototypes of ‘textual thesauri’ for each of the three texts; such thesauri allow researchers to analyse the ‘onomasiological profile’ of a text, using the statistical tools that are built into Evoke. Since the same overarching structure has been used for all three texts, these texts can now be compared on an onomasiological level. As the article demonstrates, this comparative approach gives rise to novel research questions, as new and distinctive patterns of vocabulary use come to the surface. The semantic fields discussed include “War” and “Animals”.


Sign in / Sign up

Export Citation Format

Share Document