Constructing Biomedical Knowledge Graph Based on SemMedDB and Linked Open Data

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.

Download Full-text

Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata

Research Ideas and Outcomes ◽

10.3897/rio.5.e35820 ◽

2019 ◽

Vol 5 ◽

Cited By ~ 2

Author(s):

Lane Rasberry ◽

Egon Willighagen ◽

Finn Nielsen ◽

Daniel Mietchen

Keyword(s):

Knowledge Discovery ◽

Large Scale ◽

Knowledge Workers ◽

Open Data ◽

Linked Open Data ◽

Data Curation ◽

Research Assessment ◽

Knowledge Graph ◽

Incomplete Datasets ◽

Active Contributor

Knowledge workers like researchers, students, journalists, research evaluators or funders need tools to explore what is known, how it was discovered, who made which contributions, and where the scholarly record has gaps. Existing tools and services of this kind are not available as Linked Open Data, but Wikidata is. It has the technology, active contributor base, and content to build a large-scale knowledge graph for scholarship, also known as WikiCite. Scholia visualizes this graph in an exploratory interface with profiles and links to the literature. However, it is just a working prototype. This project aims to "robustify Scholia" with back-end development and testing based on pilot corpora. The main objective at this stage is to attain stability in challenging cases such as server throttling and handling of large or incomplete datasets. Further goals include integrating Scholia with data curation and manuscript writing workflows, serving more languages, generating usage stats, and documentation.

Download Full-text

Medieval manuscripts and their migrations: Using SPARQL to investigate the research potential of an aggregated Knowledge Graph

Digital Medievalist ◽

10.16995/dm.8064 ◽

2021 ◽

Author(s):

Hanno Wijsman ◽

Toby Burrows ◽

Laura Cleaver ◽

Doug Emery ◽

Eero Hyvönen ◽

...

Keyword(s):

Query Language ◽

Open Data ◽

Literacy Skills ◽

Linked Open Data ◽

Knowledge Graph ◽

Manuscript Culture ◽

The People ◽

Computer Scientists ◽

Data Environment ◽

Manuscript Description

Although the RDF query language SPARQL has a reputation for being opaque and difficult for traditional humanists to learn, it holds great potential for opening up vast amounts of Linked Open Data to researchers willing to take on its challenges. This is especially true in the field of premodern manuscripts studies as more and more datasets relating to the study of manuscript culture are made available online. This paper explores the results of a two-year long process of collaborative learning and knowledge transfer between the computer scientists and humanities researchers from the Mapping Manuscript Migrations (MMM) project to learn and apply SPARQL to the MMM dataset. The process developed into a wider investigation of the use of SPARQL to analyse the data, refine research questions, and assess the research potential of the MMM aggregated dataset and its Knowledge Graph. Through an examination of a series of six SPARQL query case studies, this paper will demonstrate how the process of learning and applying SPARQL to query the MMM dataset returned three important and unexpected results: 1) a better understanding of a complex and imperfect dataset in a Linked Open Data environment, 2) a better understanding of how manuscript description and associated data involving the people and institutions involved in the production, reception, and trade of premodern manuscripts needs to be presented to better facilitate computational research, and 3) an awareness of need to further develop data literacy skills among researchers in order to take full advantage of the wealth of unexplored data now available to them in the Semantic Web.

Download Full-text

Ein Knowledge Graph für wissenschaftliche Sammlungen : Generierung von Linked Open Data für heterogene museale Sammlungen auf der Basis des ASCH-Modells

10.15771/ma_2019_1 ◽

2019 ◽

Author(s):

◽

Antje Niemann

Keyword(s):

Open Data ◽

Linked Open Data ◽

Knowledge Graph

Download Full-text

Chapter 9. Digging into the Mensural Music Knowledge Graph. Renaissance Polyphony meets Linked Open Data

10.5771/9783956506611-167 ◽

2021 ◽

pp. 167-180

Author(s):

Richard P. Smiraglia ◽

James Bradford Young ◽

Marnix van Berchum

Keyword(s):

Open Data ◽

Linked Open Data ◽

Knowledge Graph ◽

Renaissance Polyphony

Download Full-text

WarSampo knowledge graph: Finland in the Second World War as Linked Open Data

Semantic Web ◽

10.3233/sw-200392 ◽

2020 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Mikko Koho ◽

Esko Ikkala ◽

Petri Leskinen ◽

Minna Tamper ◽

Jouni Tuominen ◽

...

Keyword(s):

Open Data ◽

Military History ◽

Second World War ◽

Linked Open Data ◽

Knowledge Graph ◽

World War ◽

Automatic Data ◽

Data Infrastructure ◽

Sequence Of Events ◽

Second World

The Second World War (WW2) is arguably the most devastating catastrophe of human history, a topic of great interest to not only researchers but the general public. However, data about the Second World War is heterogeneous and distributed in various organizations and countries making it hard to utilize. In order to create aggregated global views of the war, a shared ontology and data infrastructure is needed to harmonize information in various data silos. This makes it possible to share data between publishers and application developers, to support data analysis in Digital Humanities research, and to develop data-driven intelligent applications. As a first step towards these goals, this article presents the WarSampo knowledge graph (KG), a shared semantic infrastructure, and a Linked Open Data (LOD) service for publishing data about WW2, with a focus on Finnish military history. The shared semantic infrastructure is based on the idea of representing war as a spatio-temporal sequence of events that soldiers, military units, and other actors participate in. The used metadata schema is an extension of CIDOC CRM, supplemented by various military history domain ontologies. With an infrastructure containing shared ontologies, maintaining the interlinked data brings upon new challenges, as one change in an ontology can propagate across several datasets that use it. To support sustainability, a repeatable automatic data transformation and linking pipeline has been created for rebuilding the whole WarSampo KG from the individual source datasets. The WarSampo KG is hosted on a data service based on W3C Semantic Web standards and best practices, including content negotiation, SPARQL API, download, automatic documentation, and other services supporting the reuse of the data. The WarSampo KG, a part of the international LOD Cloud and totalling ca. 14 million triples, is in use in nine end-user application views of the WarSampo portal, which has had over 690 000 end users since its opening in 2015.

Download Full-text

Parlamenttisampo: eduskunnan aineistojen linkitetyn avoimen datan palvelu ja sen käyttömahdollisuudet

Informaatiotutkimus ◽

10.23978/inf.107899 ◽

2021 ◽

Vol 40 (3) ◽

Author(s):

Eero Hyvönen ◽

Laura Sinikallio ◽

Petri Leskinen ◽

Senka Drobac ◽

Jouni Tuominen ◽

...

Keyword(s):

Open Data ◽

Linked Open Data ◽

Knowledge Graph ◽

Graph Data

Semanttinen parlamentti -hankkeessa 2020–2022 luodaan eduskunnan tietokannoista ja niihin liittyvistä muista aineistoista uudenlainen linkitetyn avoimen datan (Linked Open Data, LOD) palvelu, tietoinfrastruktuuri ja semanttinen portaali Parlamenttisampo – eduskunta semanttisessa webissä, joiden avulla tutkitaan poliittista kulttuuria ja kieltä. Dataa linkittämällä voi-daan rikastaa eduskuntadataa muilla tietolähteillä kuten biografisella tiedolla, terminologioilla ja lainsäädännön dokumenteilla. Parlamenttisampo on kieli- ja semanttisen webin teknologioihin perustuva palvelukokonaisuus tutkijoita, kansalaisia, mediaa ja valtionhallintoa varten. Artikkelissa esitellään hankkeen visio, ensimmäisiä tuloksia ja niiden hyödyntämismahdollisuuksia: Eduskunnan kaikkien täysistuntojen 1907–2021 yli 900 000 puheesta on valmistunut linkitetyn datan tietämysgraafi (knowledge graph); data on myös saatavilla XML-muodossa, jossa hyödynnetään uutta kansainvälistä Parla-CLARIN-formaattia. Ensimmäistä kertaa eduskunnan puheiden koko aikasarja on muunnettu dataksi ja datapalveluksi yhtenäisessä muodossa. Lisäksi puheet on yhdistetty eduskunnan kansanedustajien tietokannasta luotuun ja muista tietolähteistä rikastettuun toiseen tietämysgraafiin laajemmaksi ontologiaperustaiseksi datapalveluksi Fin- Parla. Datapalvelua voidaan käyttää eduskuntatutkimukseen parlamentaarisesta ja edustuksel-lisesta kulttuurista sekä poliittisen kielen käytöstä analysoimalla kansanedustajien täysistunnoissa pitämiä puheita ja poliitikkojen verkostoja data-analyysin keinoin. Palvelun rajapinnan avulla voidaan myös kehittää eri käyttäjäryhmille sovelluksia, kuten hankkeessa valmistuva Parlamenttisampo.fi-portaali.

Download Full-text

TOWARD A LINKED OPEN DATA REPOSITORY ABOUT VIETNAMESE TOURISM

KỶ YẾU HỘI NGHỊ KHOA HỌC CÔNG NGHỆ QUỐC GIA LẦN THỨ XI NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN ◽

10.15625/vap.2018.00067 ◽

2018 ◽

Author(s):

Le Anh Tien ◽

Cao Tuan Dung

Keyword(s):

Open Data ◽

Linked Open Data ◽

Data Repository

Download Full-text

Opportunités et défis. Linked (Open) Data

Dialogues avec la machine - Arabesques ◽

10.35562/arabesques.1397 ◽

2019 ◽

pp. 4-6

Author(s):

Makx Dekkers

Keyword(s):

Open Data ◽

Linked Open Data

Download Full-text

Europeana no Linked Open Data: conceitos de Web Semântica na dimensão aplicada das humanidades digitais

Pesquisa Brasileira em Ciência da Informação e Biblioteconomia ◽

10.22478/ufpb.1981-0695.2017v12n2.36529 ◽

2017 ◽

Vol 12 (2) ◽

Author(s):

Caio Saraiva Coneglian ◽

José Eduardo Santarem Segundo

Keyword(s):

Linked Data ◽

Open Data ◽

Linked Open Data

O surgimento de novas tecnologias, tem introduzido meios para a divulgação e a disponibilização das informações mais eficientemente. Uma iniciativa, chamada de Europeana, vem promovendo esta adaptação dos objetos informacionais dentro da Web, e mais especificamente no Linked Data. Desta forma, o presente estudo tem como objetivo apresentar uma discussão acerca da relação entre as Humanidades Digitais e o Linked Open Data, na figura da Europeana. Para tal, utilizamos uma metodologia exploratória e que busca explorar as questões relacionadas ao modelo de dados da Europeana, EDM, por meio do SPARQL. Como resultados, compreendemos as características do EDM, pela utilização do SPARQL. Identificamos, ainda, a importância que o conceito de Humanidades Digitais possui dentro do contexto da Europeana.Palavras-chave: Web semântica. Linked open data. Humanidades digitais. Europeana. EDM.Link: https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n48p88/33031

Download Full-text