Role of Vocabulary for Semantic Interoperability in Enabling the Linked Open Data Publishing

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.

Download Full-text

Integrating findings of traditional medicine with modern pharmaceutical research: the potential role of linked open data

Chinese Medicine ◽

10.1186/1749-8546-5-43 ◽

2010 ◽

Vol 5 (1) ◽

pp. 43 ◽

Cited By ~ 10

Author(s):

Matthias Samwald ◽

Michel Dumontier ◽

Jun Zhao ◽

Joanne S Luciano ◽

Michael Marshall ◽

...

Keyword(s):

Traditional Medicine ◽

Potential Role ◽

Pharmaceutical Research ◽

Open Data ◽

Linked Open Data

Download Full-text

A Sustainable Method for Publishing Interoperable Open Data on the Web

Data ◽

10.3390/data6080093 ◽

2021 ◽

Vol 6 (8) ◽

pp. 93

Author(s):

Raf Buyle ◽

Brecht Van de Vyvere ◽

Julián Rojas Meléndez ◽

Dwight Van Lancker ◽

Eveline Vlassenroot ◽

...

Keyword(s):

Air Quality ◽

Smart Cities ◽

Research Question ◽

Open Data ◽

Cost Effective ◽

Linked Open Data ◽

Sensor Data ◽

Quality Data ◽

Data Publishing ◽

Railway Infrastructure

Smart cities need (sensor) data for better decision-making. However, while there are vast amounts of data available about and from cities, an intermediary is needed that connects and interprets (sensor) data on a Web-scale. Today, governments in Europe are struggling to publish open data in a sustainable, predictable and cost-effective way. Our research question considers what methods for publishing Linked Open Data time series, in particular air quality data, are suitable in a sustainable and cost-effective way. Furthermore, we demonstrate the cross-domain applicability of our data publishing approach through a different use case on railway infrastructure—Linked Open Data. Based on scenarios co-created with various governmental stakeholders, we researched methods to promote data interoperability, scalability and flexibility. The results show that applying a Linked Data Fragments-based approach on public endpoints for air quality and railway infrastructure data, lowers the cost of publishing and increases availability due to better Web caching strategies.

Download Full-text

An Entity By Any Other Name: Linked Open Data as a Basis for a Decentered, Dynamic Scholarly Publishing Ecology

Scholarly and Research Communication ◽

10.22230/src.2015v6n2a212 ◽

2015 ◽

Vol 6 (2) ◽

Author(s):

Susan Brown ◽

John Simpson

Keyword(s):

Linked Data ◽

Open Data ◽

Scholarly Publishing ◽

Linked Open Data ◽

Data Publishing ◽

Current State ◽

Publishing Practices

Linked open data provides a means of producing an interlinked and more navigable scholarly environment to permit: the better integration of research materials; the potential to address the specificities of the nomenclature, discourses, and methodologies; and the ability to respect institutional and individual investments. The paper proposes a linked data publishing ecology based on collaborations between the scholarly, publishing, and library communities, and tempered by a consideration of the current state of linked data publishing practices and infrastructure gaps with respect to enabling such collaboration, particularly in the humanities.

Download Full-text

Role of Vocabularies for Semantic Interoperability in Enabling the Linked Open Data Publishing

Cases on Open-Linked Data and Semantic Web Applications ◽

10.4018/978-1-4666-2827-4.ch005 ◽

2013 ◽

pp. 84-104

Author(s):

Ahsan Morshed

Keyword(s):

Open Data ◽

Controlled Vocabulary ◽

Data Publishing ◽

Centralized Control ◽

Web Of Data ◽

Internet Information ◽

Skills Education ◽

Standard Terms ◽

The Web

In the spite of explosive growth of the Internet, information relevant to users is often unavailable even when using the latest browsers. At the same time, there is an ever-increasing number of documents that vary widely in content, format, and quality. The documents often change in content and location because they do not belong to any kind of centralized control. On the other hand, there is a huge number of unknown users with extremely diverse needs, skills, education, and cultural and language backgrounds. One of the solutions to these problems might be to use standard terms with meaning; this can be termed as controlled vocabulary (CV). Though there is no specific notion of CV, we can define it as a set of concepts or preferred terms and existing relations among them. These vocabularies play very important roles classifying the information. In this chapter, we focus the role of CV for publishing the web of data on the Web.

Download Full-text

TOWARD A LINKED OPEN DATA REPOSITORY ABOUT VIETNAMESE TOURISM

KỶ YẾU HỘI NGHỊ KHOA HỌC CÔNG NGHỆ QUỐC GIA LẦN THỨ XI NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN ◽

10.15625/vap.2018.00067 ◽

2018 ◽

Author(s):

Le Anh Tien ◽

Cao Tuan Dung

Keyword(s):

Open Data ◽

Linked Open Data ◽

Data Repository

Download Full-text

Opportunités et défis. Linked (Open) Data

Dialogues avec la machine - Arabesques ◽

10.35562/arabesques.1397 ◽

2019 ◽

pp. 4-6

Author(s):

Makx Dekkers

Keyword(s):

Open Data ◽

Linked Open Data

Download Full-text

Europeana no Linked Open Data: conceitos de Web Semântica na dimensão aplicada das humanidades digitais

Pesquisa Brasileira em Ciência da Informação e Biblioteconomia ◽

10.22478/ufpb.1981-0695.2017v12n2.36529 ◽

2017 ◽

Vol 12 (2) ◽

Author(s):

Caio Saraiva Coneglian ◽

José Eduardo Santarem Segundo

Keyword(s):

Linked Data ◽

Open Data ◽

Linked Open Data

O surgimento de novas tecnologias, tem introduzido meios para a divulgação e a disponibilização das informações mais eficientemente. Uma iniciativa, chamada de Europeana, vem promovendo esta adaptação dos objetos informacionais dentro da Web, e mais especificamente no Linked Data. Desta forma, o presente estudo tem como objetivo apresentar uma discussão acerca da relação entre as Humanidades Digitais e o Linked Open Data, na figura da Europeana. Para tal, utilizamos uma metodologia exploratória e que busca explorar as questões relacionadas ao modelo de dados da Europeana, EDM, por meio do SPARQL. Como resultados, compreendemos as características do EDM, pela utilização do SPARQL. Identificamos, ainda, a importância que o conceito de Humanidades Digitais possui dentro do contexto da Europeana.Palavras-chave: Web semântica. Linked open data. Humanidades digitais. Europeana. EDM.Link: https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n48p88/33031

Download Full-text

Decentralized Linked Open Data in Constrained Wireless Sensor Networks

2020 7th International Conference on Internet of Things: Systems, Management and Security (IOTSMS) ◽

10.1109/iotsms52051.2020.9340221 ◽

2020 ◽

Author(s):

Bart Moons ◽

Flor Sanders ◽

Thijs Paelman ◽

Jeroen Hoebeke

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Open Data ◽

Linked Open Data ◽

Wireless Sensor

Download Full-text

A Hybrid Approach Combining R*-Tree and k-d Trees to Improve Linked Open Data Query Performance

Applied Sciences ◽

10.3390/app11052405 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2405

Author(s):

Yuxiang Sun ◽

Tianyi Zhao ◽

Seulgi Yoon ◽

Yongju Lee

Keyword(s):

Flash Memory ◽

Query Language ◽

Hybrid Approach ◽

Open Data ◽

Main Memory ◽

Linked Open Data ◽

Index Structure ◽

Identification Algorithm ◽

Distributed Computing Systems ◽

Query Performance

Semantic Web has recently gained traction with the use of Linked Open Data (LOD) on the Web. Although numerous state-of-the-art methodologies, standards, and technologies are applicable to the LOD cloud, many issues persist. Because the LOD cloud is based on graph-based resource description framework (RDF) triples and the SPARQL query language, we cannot directly adopt traditional techniques employed for database management systems or distributed computing systems. This paper addresses how the LOD cloud can be efficiently organized, retrieved, and evaluated. We propose a novel hybrid approach that combines the index and live exploration approaches for improved LOD join query performance. Using a two-step index structure combining a disk-based 3D R*-tree with the extended multidimensional histogram and flash memory-based k-d trees, we can efficiently discover interlinked data distributed across multiple resources. Because this method rapidly prunes numerous false hits, the performance of join query processing is remarkably improved. We also propose a hot-cold segment identification algorithm to identify regions of high interest. The proposed method is compared with existing popular methods on real RDF datasets. Results indicate that our method outperforms the existing methods because it can quickly obtain target results by reducing unnecessary data scanning and reduce the amount of main memory required to load filtering results.

Download Full-text