Medieval manuscripts and their migrations: Using SPARQL to investigate the research potential of an aggregated Knowledge Graph

Digital Medievalist ◽

10.16995/dm.8064 ◽

2021 ◽

Author(s):

Hanno Wijsman ◽

Toby Burrows ◽

Laura Cleaver ◽

Doug Emery ◽

Eero Hyvönen ◽

...

Keyword(s):

Query Language ◽

Open Data ◽

Literacy Skills ◽

Linked Open Data ◽

Knowledge Graph ◽

Manuscript Culture ◽

The People ◽

Computer Scientists ◽

Data Environment ◽

Manuscript Description

Although the RDF query language SPARQL has a reputation for being opaque and difficult for traditional humanists to learn, it holds great potential for opening up vast amounts of Linked Open Data to researchers willing to take on its challenges. This is especially true in the field of premodern manuscripts studies as more and more datasets relating to the study of manuscript culture are made available online. This paper explores the results of a two-year long process of collaborative learning and knowledge transfer between the computer scientists and humanities researchers from the Mapping Manuscript Migrations (MMM) project to learn and apply SPARQL to the MMM dataset. The process developed into a wider investigation of the use of SPARQL to analyse the data, refine research questions, and assess the research potential of the MMM aggregated dataset and its Knowledge Graph. Through an examination of a series of six SPARQL query case studies, this paper will demonstrate how the process of learning and applying SPARQL to query the MMM dataset returned three important and unexpected results: 1) a better understanding of a complex and imperfect dataset in a Linked Open Data environment, 2) a better understanding of how manuscript description and associated data involving the people and institutions involved in the production, reception, and trade of premodern manuscripts needs to be presented to better facilitate computational research, and 3) an awareness of need to further develop data literacy skills among researchers in order to take full advantage of the wealth of unexplored data now available to them in the Semantic Web.

Download Full-text

A Hybrid Approach Combining R*-Tree and k-d Trees to Improve Linked Open Data Query Performance

Applied Sciences ◽

10.3390/app11052405 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2405

Author(s):

Yuxiang Sun ◽

Tianyi Zhao ◽

Seulgi Yoon ◽

Yongju Lee

Keyword(s):

Flash Memory ◽

Query Language ◽

Hybrid Approach ◽

Open Data ◽

Main Memory ◽

Linked Open Data ◽

Index Structure ◽

Identification Algorithm ◽

Distributed Computing Systems ◽

Query Performance

Semantic Web has recently gained traction with the use of Linked Open Data (LOD) on the Web. Although numerous state-of-the-art methodologies, standards, and technologies are applicable to the LOD cloud, many issues persist. Because the LOD cloud is based on graph-based resource description framework (RDF) triples and the SPARQL query language, we cannot directly adopt traditional techniques employed for database management systems or distributed computing systems. This paper addresses how the LOD cloud can be efficiently organized, retrieved, and evaluated. We propose a novel hybrid approach that combines the index and live exploration approaches for improved LOD join query performance. Using a two-step index structure combining a disk-based 3D R*-tree with the extended multidimensional histogram and flash memory-based k-d trees, we can efficiently discover interlinked data distributed across multiple resources. Because this method rapidly prunes numerous false hits, the performance of join query processing is remarkably improved. We also propose a hot-cold segment identification algorithm to identify regions of high interest. The proposed method is compared with existing popular methods on real RDF datasets. Results indicate that our method outperforms the existing methods because it can quickly obtain target results by reducing unnecessary data scanning and reduce the amount of main memory required to load filtering results.

Download Full-text

The Pensoft Data Publishing Workflow: The FAIRway from articles to Linked Open Data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35902 ◽

2019 ◽

Vol 3 ◽

Author(s):

Lyubomir Penev ◽

Teodor Georgiev ◽

Viktor Senderov ◽

Mariya Dimitrova ◽

Pavel Stoev

Keyword(s):

Open Data ◽

Structured Data ◽

Linked Open Data ◽

Data Publishing ◽

Knowledge Graph ◽

Supplementary File ◽

Biodiversity Data ◽

Text Format ◽

Biodiversity Knowledge ◽

Data Elements

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.

Download Full-text

Awareness of Linked Open Data Among the Employees of Polish Libraries, Archives, and Museums

Zagadnienia Informacji Naukowej - Studia Informacyjne ◽

10.36702/zin.826 ◽

2022 ◽

Vol 59 (2(118)) ◽

pp. 7-25

Author(s):

Dorota Siwecka

Keyword(s):

Linked Data ◽

Open Data ◽

Linked Open Data ◽

Survey Method ◽

Doctorate Degree ◽

Research Libraries ◽

The People ◽

Central Statistical ◽

Central Statistical Office ◽

The Subject

Purpose/Thesis: This article presents the results of a survey conducted in January 2021 among employees of Polish libraries, museums, and archives, examining their awareness of open linked data technologies. The research had a pilot character and its results will be used to improve the questionnaire and to conduct research on a wider scale. Approach/Methods: The survey method was used in the study. Results and conclusions: On the basis of answers received, it can be concluded that open linked data is not yet very well-known among employees of Polish libraries, museums, and archives. Those most aware of technologies allowing for machine understanding of content shared on the Web are doctorate degree-holders employed in research libraries. Furthermore, awareness of the projects using LOD technologies does not correlate with awareness of these technological solutions. Research limitations: The number of respondents (415) constitutes 1% of all the people employed in libraries, archives, and museums in Poland (based on data provided by the Central Statistical Office of Poland). This is not a large number, but considering the variety among the respondents, the sample can be considered representative. Originality/Value: The awareness of Linked Open Data among employees of Polish libraries, archives, and museums has not been the subject of any study so far. In fact, this type of research has not been conducted in other countries either.

Download Full-text

Constructing Biomedical Knowledge Graph Based on SemMedDB and Linked Open Data

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2018.8621568 ◽

2018 ◽

Cited By ~ 6

Author(s):

Qing Cong ◽

Zhiyong Feng ◽

Fang Li ◽

Li Zhang ◽

Guozheng Rao ◽

...

Keyword(s):

Open Data ◽

Linked Open Data ◽

Knowledge Graph ◽

Biomedical Knowledge

Download Full-text

Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata

Research Ideas and Outcomes ◽

10.3897/rio.5.e35820 ◽

2019 ◽

Vol 5 ◽

Cited By ~ 2

Author(s):

Lane Rasberry ◽

Egon Willighagen ◽

Finn Nielsen ◽

Daniel Mietchen

Keyword(s):

Knowledge Discovery ◽

Large Scale ◽

Knowledge Workers ◽

Open Data ◽

Linked Open Data ◽

Data Curation ◽

Research Assessment ◽

Knowledge Graph ◽

Incomplete Datasets ◽

Active Contributor

Knowledge workers like researchers, students, journalists, research evaluators or funders need tools to explore what is known, how it was discovered, who made which contributions, and where the scholarly record has gaps. Existing tools and services of this kind are not available as Linked Open Data, but Wikidata is. It has the technology, active contributor base, and content to build a large-scale knowledge graph for scholarship, also known as WikiCite. Scholia visualizes this graph in an exploratory interface with profiles and links to the literature. However, it is just a working prototype. This project aims to "robustify Scholia" with back-end development and testing based on pilot corpora. The main objective at this stage is to attain stability in challenging cases such as server throttling and handling of large or incomplete datasets. Further goals include integrating Scholia with data curation and manuscript writing workflows, serving more languages, generating usage stats, and documentation.

Download Full-text

Modern Users of Libraries and the Linked Open Data Environment

Bibliotekovedenie [Library and Information Science (Russia)] ◽

10.25281/0869-608x-2020-69-3-243-260 ◽

2020 ◽

Vol 69 (3) ◽

pp. 243-260

Author(s):

Olga A. Lavrenova ◽

Andrey A. Vinberg

Keyword(s):

Classification System ◽

Query Language ◽

Open Data ◽

Basic Research ◽

Linked Open Data ◽

Knowledge Organization ◽

Global Network ◽

Knowledge Organization System ◽

Description Framework ◽

State Library

The goal of any library is to ensure high quality and general availability of information retrieval tools. The paper describes the project implemented by the Russian State Library (RSL) to present Library Bibliographic Classification as a Networked Knowledge Organization System. The project goal is to support content and provide tools for ensuring system’s interoperability with other resources of the same nature (i.e. with Linked Data Vocabularies) in the global network environment. The project was partially supported by the Russian Foundation for Basic Research (RFBR).The RSL General Classified Catalogue (GCC) was selected as the main data source for the Classification system of knowledge organization. The meaning of each classification number is expressed by complete string of wordings (captions), rather than the last level caption alone. Data converted to the Resource Description Framework (RDF) files based on the standard set of properties defined in the Simple Knowledge Organization System (SKOS) model was loaded into the semantic storage for subsequent data processing using the SPARQL query language. In order to enrich user queries for search of resources, the RSL has published its Classification System in the form of Linked Open Data (https://lod.rsl.ru) for searching in the RSL electronic catalogue. Currently, the work is underway to enable its smooth integration with other LOD vocabularies. The SKOS mapping tags are used to differentiate the types of connections between SKOS elements (concepts) existing in different concept schemes, for example, UDC, MeSH, authority data.The conceptual schemes of the leading classifications are fundamentally different from each other. Establishing correspondence between concepts is possible only on the basis of lexical and structural analysis to compute the concept similarity as a combination of attributes.The authors are looking forward to working with libraries in Russia and other countries to create a common space of Linked Open Data vocabularies.

Download Full-text

Using SPARQL to access Linked Open Data

The Programming Historian ◽

10.46430/phen0047 ◽

2015 ◽

Author(s):

Matthew Lincoln

Keyword(s):

Query Language ◽

Open Data ◽

Linked Open Data ◽

Graph Databases ◽

Cultural Institutions

This lesson explains why many cultural institutions are adopting graph databases, and how researchers can access these data though the query language called SPARQL.

Download Full-text

Introduction to the Principles of Linked Open Data

The Programming Historian ◽

10.46430/phen0068 ◽

2017 ◽

Author(s):

Jonathan Blaney

Keyword(s):

Query Language ◽

Open Data ◽

Linked Open Data ◽

Graph Query ◽

Graph Query Language ◽

Core Concepts

Introduces core concepts of Linked Open Data, including URIs, ontologies, RDF formats, and a gentle intro to the graph query language SPARQL.

Download Full-text

Library Bibliographic Classification as a traditional knowledge organization system in the linked open data environment

Scientific and Technical Libraries ◽

10.33186/1027-3689-2017-4-44-60 ◽

2017 ◽

pp. 44-60 ◽

Cited By ~ 2

Author(s):

Olga Lavrenova ◽

Vasili Pavlov

Keyword(s):

Data Structure ◽

Semantic Web ◽

Traditional Knowledge ◽

Open Data ◽

Linked Open Data ◽

Knowledge Organization ◽

Knowledge Model ◽

Library Catalog ◽

Knowledge Organization System ◽

Data Environment

The task of the project is to introduce classification knowledge model as Linked Open Data, LOD, and to provide access to it from Semantic Web through standard network instruments. The reviewed system of RSL systematic catalog (Library Bibliographic Classification) subject divisions is accepted as a source for classification knowledge model The authors examine SKOS-based data structure and the software. Operation principles in terms of the search in e-library catalog and RSL traditional collections are discussed.

Download Full-text

Ein Knowledge Graph für wissenschaftliche Sammlungen : Generierung von Linked Open Data für heterogene museale Sammlungen auf der Basis des ASCH-Modells

10.15771/ma_2019_1 ◽

2019 ◽

Author(s):

◽

Antje Niemann

Keyword(s):

Open Data ◽

Linked Open Data ◽

Knowledge Graph

Download Full-text