Representing COVID-19 information in collaborative knowledge graphs: The case of Wikidata

Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenges and opportunities. Wikidata is an interdisciplinary, multilingual, open collaborative knowledge base of more than 90 million entities connected by well over a billion relationships. It acts as a web-scale platform for broader computer-supported cooperative work and linked open data, since it can be written to and queried in multiple ways in near real time by specialists, automated tools and the public. The main query language, SPARQL, is a semantic language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format. Here, we introduce four aspects of Wikidata that enable it to serve as a knowledge base for general information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The rich knowledge graph created for COVID-19 in Wikidata can be visualized, explored, and analyzed for purposes like decision support as well as educational and scholarly research.

Download Full-text

SPARC Data Structure: Rationale and Design of a FAIR Standard for Biomedical Research Data

10.1101/2021.02.10.430563 ◽

2021 ◽

Author(s):

Anita Bandrowski ◽

Jeffrey S. Grethe ◽

Anna Pilko ◽

Tom Gillespie ◽

Gabi Pine ◽

...

Keyword(s):

Data Structure ◽

Biomedical Research ◽

Large Scale ◽

Open Data ◽

Cell Types ◽

Research Data ◽

Imaging Data ◽

Organ Specific ◽

The Rich ◽

Automated Tools

AbstractThe NIH Common Fund’s Stimulating Peripheral Activity to Relieve Conditions (SPARC) initiative is a large-scale program that seeks to accelerate the development of therapeutic devices that modulate electrical activity in nerves to improve organ function. Integral to the SPARC program are the rich anatomical and functional datasets produced by investigators across the SPARC consortium that provide key details about organ-specific circuitry, including structural and functional connectivity, mapping of cell types and molecular profiling. These datasets are provided to the research community through an open data platform, the SPARC Portal. To ensure SPARC datasets are Findable, Accessible, Interoperable and Reusable (FAIR), they are all submitted to the SPARC portal following a standard scheme established by the SPARC Curation Team, called the SPARC Data Structure (SDS). Inspired by the Brain Imaging Data Structure (BIDS), the SDS has been designed to capture the large variety of data generated by SPARC investigators who are coming from all fields of biomedical research. Here we present the rationale and design of the SDS, including a description of the SPARC curation process and the automated tools for complying with the SDS, including the SDS validator and Software to Organize Data Automatically (SODA) for SPARC. The objective is to provide detailed guidelines for anyone desiring to comply with the SDS. Since the SDS are suitable for any type of biomedical research data, it can be adopted by any group desiring to follow the FAIR data principles for managing their data, even outside of the SPARC consortium. Finally, this manuscript provides a foundational framework that can be used by any organization desiring to either adapt the SDS to suit the specific needs of their data or simply desiring to design their own FAIR data sharing scheme from scratch.

Download Full-text

Modern Users of Libraries and the Linked Open Data Environment

Bibliotekovedenie [Library and Information Science (Russia)] ◽

10.25281/0869-608x-2020-69-3-243-260 ◽

2020 ◽

Vol 69 (3) ◽

pp. 243-260

Author(s):

Olga A. Lavrenova ◽

Andrey A. Vinberg

Keyword(s):

Classification System ◽

Query Language ◽

Open Data ◽

Basic Research ◽

Linked Open Data ◽

Knowledge Organization ◽

Global Network ◽

Knowledge Organization System ◽

Description Framework ◽

State Library

The goal of any library is to ensure high quality and general availability of information retrieval tools. The paper describes the project implemented by the Russian State Library (RSL) to present Library Bibliographic Classification as a Networked Knowledge Organization System. The project goal is to support content and provide tools for ensuring system’s interoperability with other resources of the same nature (i.e. with Linked Data Vocabularies) in the global network environment. The project was partially supported by the Russian Foundation for Basic Research (RFBR).The RSL General Classified Catalogue (GCC) was selected as the main data source for the Classification system of knowledge organization. The meaning of each classification number is expressed by complete string of wordings (captions), rather than the last level caption alone. Data converted to the Resource Description Framework (RDF) files based on the standard set of properties defined in the Simple Knowledge Organization System (SKOS) model was loaded into the semantic storage for subsequent data processing using the SPARQL query language. In order to enrich user queries for search of resources, the RSL has published its Classification System in the form of Linked Open Data (https://lod.rsl.ru) for searching in the RSL electronic catalogue. Currently, the work is underway to enable its smooth integration with other LOD vocabularies. The SKOS mapping tags are used to differentiate the types of connections between SKOS elements (concepts) existing in different concept schemes, for example, UDC, MeSH, authority data.The conceptual schemes of the leading classifications are fundamentally different from each other. Establishing correspondence between concepts is possible only on the basis of lexical and structural analysis to compute the concept similarity as a combination of attributes.The authors are looking forward to working with libraries in Russia and other countries to create a common space of Linked Open Data vocabularies.

Download Full-text

BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services

F1000Research ◽

10.12688/f1000research.18238.1 ◽

2019 ◽

Vol 8 ◽

pp. 1677

Author(s):

Toshiaki Katayama ◽

Shuichi Kawashima ◽

Gos Micklem ◽

Shin Kawano ◽

Jin-Dong Kim ◽

...

Keyword(s):

Service Discovery ◽

Query Language ◽

Life Sciences ◽

Open Data ◽

Semantic Interoperability ◽

Science Data ◽

Rdf Data ◽

Description Framework ◽

Machine Readable

Publishing databases in the Resource Description Framework (RDF) model is becoming widely accepted to maximize the syntactic and semantic interoperability of open data in life sciences. Here we report advancements made in the 6th and 7th annual BioHackathons which were held in Tokyo and Miyagi respectively. This review consists of two major sections covering: 1) improvement and utilization of RDF data in various domains of the life sciences and 2) meta-data about these RDF data, the resources that store them, and the service quality of SPARQL Protocol and RDF Query Language (SPARQL) endpoints. The first section describes how we developed RDF data, ontologies and tools in genomics, proteomics, metabolomics, glycomics and by literature text mining. The second section describes how we defined descriptions of datasets, the provenance of data, and quality assessment of services and service discovery. By enhancing the harmonization of these two layers of machine-readable data and knowledge, we improve the way community wide resources are developed and published. Moreover, we outline best practices for the future, and prepare ourselves for an exciting and unanticipatable variety of real world applications in coming years.

Download Full-text

OPEN DATA PLATFORMS FOR A DATA COMMONS, HOW TO SUPPORT THE LIFE OF DATA BEYOND RELEASE

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2020i0.11226 ◽

2020 ◽

Author(s):

Hannah Hamilton ◽

Stefano De Paoli ◽

Anna Wilson ◽

Greg Singh

Keyword(s):

Empirical Research ◽

Qualitative Interviews ◽

Open Data ◽

The Public ◽

Challenges And Opportunities ◽

Government Data ◽

The One ◽

Project Data ◽

Data Commons

In this paper we will discuss the challenges and opportunities of infusing in open data a life beyond its original public release. Indeed, it is often unclear whether open data has a life beyond the one it was initially collected for, to the extent that some authors have even described the public reuse of government data as no more than a “myth”. We will present the results of the project Data Commons Scotland launched with the idea of creating an Internet based prototype platform for creating a trustworthy common of open data, thus facilitating a life for data beyond the one of the original producer. We will discuss the results of our empirical research for the project based on 31 qualitative interviews with a number of actors, such as data producers or citizens. Moreover, we will present the results of the co-design conducted for the design of the Data Commons Scotland platform. With the results of our analysis we will reflect on the challenges of building Internet based platforms for open data supporting the generation of a common.

Download Full-text

Open science precision medicine in Canada: Points to consider

FACETS ◽

10.1139/facets-2018-0034 ◽

2019 ◽

Vol 4 (1) ◽

pp. 1-19

Author(s):

Palmira Granados Moreno ◽

Sarah E. Ali-Khan ◽

Benjamin Capps ◽

Timothy Caulfield ◽

Damien Chalaud ◽

...

Keyword(s):

Precision Medicine ◽

Open Data ◽

Open Science ◽

Market Size ◽

The Public ◽

Health Related ◽

Mcgill University ◽

Blockbuster Drug ◽

The Rich ◽

Translational Process

Open science can significantly influence the development and translational process of precision medicine in Canada. Precision medicine presents a unique opportunity to improve disease prevention and healthcare, as well as to reduce health-related expenditures. However, the development of precision medicine also brings about economic challenges, such as costly development, high failure rates, and reduced market size in comparison with the traditional blockbuster drug development model. Open science, characterized by principles of open data sharing, fast dissemination of knowledge, cumulative research, and cooperation, presents a unique opportunity to address these economic challenges while also promoting the public good. The Centre of Genomics and Policy at McGill University organized a stakeholders’ workshop in Montreal in March 2018. The workshop entitled “Could Open be the Yellow Brick Road to Precision Medicine?” provided a forum for stakeholders to share experiences and identify common objectives, challenges, and needs to be addressed to promote open science initiatives in precision medicine. The rich presentations and exchanges that took place during the meeting resulted in this consensus paper containing key considerations for open science precision medicine in Canada. Stakeholders would benefit from addressing these considerations as to promote a more coherent and dynamic open science ecosystem for precision medicine.

Download Full-text

TOWARDS AN EFFICIENT RDF DATASET SLICING

International Journal of Semantic Computing ◽

10.1142/s1793351x13400151 ◽

2013 ◽

Vol 07 (04) ◽

pp. 455-477 ◽

Cited By ~ 2

Author(s):

EDGARD MARX ◽

TOMMASO SORU ◽

SAEEDEH SHEKARPOUR ◽

SÖREN AUER ◽

AXEL-CYRILLE NGONGA NGOMO ◽

...

Keyword(s):

Information Needs ◽

Query Language ◽

Open Data ◽

Linked Open Data ◽

Connected Subgraph ◽

Triple Store ◽

Subgraph Pattern ◽

Order Of Magnitude ◽

Efficient Processing ◽

Description Framework

Over the last years, a considerable amount of structured data has been published on the Web as Linked Open Data (LOD). Despite recent advances, consuming and using Linked Open Data within an organization is still a substantial challenge. Many of the LOD datasets are quite large and despite progress in Resource Description Framework (RDF) data management their loading and querying within a triple store is extremely time-consuming and resource-demanding. To overcome this consumption obstacle, we propose a process inspired by the classical Extract-Transform-Load (ETL) paradigm. In this article, we focus particularly on the selection and extraction steps of this process. We devise a fragment of SPARQL Protocol and RDF Query Language (SPARQL) dubbed SliceSPARQL, which enables the selection of well-defined slices of datasets fulfilling typical information needs. SliceSPARQL supports graph patterns for which each connected subgraph pattern involves a maximum of one variable or Internationalized resource identifier (IRI) in its join conditions. This restriction guarantees the efficient processing of the query against a sequential dataset dump stream. Furthermore, we evaluate our slicing approach on three different optimization strategies. Results show that dataset slices can be generated an order of magnitude faster than by using the conventional approach of loading the whole dataset into a triple store.

Download Full-text

GEOYASGUI: THE GEOSPARQL QUERY EDITOR AND RESULT SET VISUALIZER

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w2-39-2017 ◽

2017 ◽

Vol XLII-4/W2 ◽

pp. 39-42 ◽

Cited By ~ 2

Author(s):

W. Beek ◽

E. Folmer ◽

L. Rietveld ◽

J. Walker

Keyword(s):

The Netherlands ◽

Spatial Data ◽

Query Language ◽

Open Data ◽

Dutch Government ◽

Legal Certainty ◽

Software Product ◽

Human Labor ◽

Shared Service ◽

The Rich

The Netherlands' Cadastre, Land Registry and Mapping Agency – in short Kadaster – collects and registers administrative and spatial data on property and the rights involved. This includes for ships, aircraft and telecommunications networks. Doing so, Kadaster protects legal certainty. The Kadaster publishes many large authoritative datasets including several key registers of the Dutch Government (Topography, Addresses and Buildings). Furthermore Kadaster is also developing and maintaining the PDOK shared service, in which about 100 spatial datasets are being published in several formats, including an incredible amount of detailed geospatial objects. Geospatial objects include all plots of land, all buildings, all roads and all lampposts. These objects are spatially and/or conceptually related, but are maintained by different data curators. As a result these datasets are syntactically and architecturally disjoint, and using them together currently requires non-trivial human labor. <br><br> In response to this, Kadaster is currently publishing its geo-spatial data assets as Linked Open Data. The standardized query language for Linked Open Geodata is GeoSPARQL. Unfortunately, current tooling does not support writing and evaluating GeoSPARQL queries. This paper presents GeoYASGUI, a GeoSPARQL editor and result-set viewer with IDE capabilities. GeoYASGUI is not a new software product, but an integration of and a collection of updates to existing Open Source libraries. With GeoYASGUI it becomes possible to query the rich Open Data assets of the Kadaster.

Download Full-text

Automated Annotations for AI Data and Model Transparency

Journal of Data and Information Quality ◽

10.1145/3460000 ◽

2022 ◽

Vol 14 (1) ◽

pp. 1-9

Author(s):

Saravanan Thirumuruganathan ◽

Mayuresh Kunjir ◽

Mourad Ouzzani ◽

Sanjay Chawla

Keyword(s):

Open Data ◽

Ease Of Use ◽

Quality Data ◽

Key Factors ◽

Data Governance ◽

Policy Compliance ◽

The Public ◽

Use Of Data ◽

Challenges And Opportunities ◽

Data Transparency

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as data.gov and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

Download Full-text

Troubling Pasts: Teaching Public History in Northern Ireland

International Public History ◽

10.1515/iph-2021-2017 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Olwen Purdue

Keyword(s):

Northern Ireland ◽

Public History ◽

Cultural Landscape ◽

The Public ◽

Public Arena ◽

Post Conflict ◽

Wide Range ◽

Challenges And Opportunities ◽

The Rich ◽

Forward Looking

Abstract This article explores the challenges and opportunities presented for the teaching and practice of public history in a post-conflict society that remains deeply divided over its past. It examines some of the negative ways in which history is used in the public arena, but also the potential of public history initiatives for building a more cohesive and forward-looking society. It examines how students can use the rich cultural landscape of Northern Ireland and engage with a wide range of experienced practitioners to learn more about the ways in which history divides; how we can negotiate these divisions over interpretations; how different communities understand, represent, and engage with their past; and why this matters.

Download Full-text

Biblical Literacy and the English King James Liberal Bible

Postscripts The Journal of Sacred Texts and Contemporary Worlds ◽

10.1558/post.v7i2.197 ◽

2014 ◽

Vol 7 (2) ◽

pp. 197-211

Author(s):

James Crossley

Keyword(s):

Cultural Heritage ◽

Test Case ◽

Biblical Literacy ◽

The Public ◽

Public Arena ◽

The Bible ◽

Liberal Capitalism ◽

The Rich ◽

Popular Nationalism ◽

King James Bible

Using the 400th anniversary of the King James Bible as a test case, this article illustrates some of the important ways in which the Bible is understood and consumed and how it has continued to survive in an age of neoliberalism and postmodernity. It is clear that instant recognition of the Bible-as-artefact, multiple repackaging and pithy biblical phrases, combined with a popular nationalism, provide distinctive strands of this understanding and survival. It is also clear that the KJV is seen as a key part of a proud English cultural heritage and tied in with traditions of democracy and tolerance, despite having next to nothing to do with either. Anything potentially problematic for Western liberal discourse (e.g. calling outsiders “dogs,” smashing babies heads against rocks, Hades-fire for the rich, killing heretics, using the Bible to convert and colonize, etc.) is effectively removed, or even encouraged to be removed, from such discussions of the KJV and the Bible in the public arena. In other words, this is a decaffeinated Bible that has been colonized by, and has adapted to, Western liberal capitalism.

Download Full-text