An empirical meta-analysis of the life sciences linked open data on the web

AbstractWhile the biomedical community has published several “open data” sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 biomedical linked open data sources into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.

Download Full-text

Creation and Integration of Reference Ontologies for Efficient LOD Management

Semi-Automatic Ontology Development ◽

10.4018/978-1-4666-0188-8.ch007 ◽

2012 ◽

pp. 162-199 ◽

Cited By ~ 1

Author(s):

Mariana Damova ◽

Atanas Kiryakov ◽

Maurice Grinberg ◽

Michael K. Bergman ◽

Frédérick Giasson ◽

...

Keyword(s):

Real Life ◽

Open Data ◽

Real Data ◽

Heterogeneous Data ◽

Linked Open Data ◽

Data Space ◽

Upper Level ◽

Reference Knowledge ◽

Global Data ◽

The Web

The chapter introduces the process of design of two upper-level ontologies—PROTON and UMBEL—into reference ontologies and their integration in the so-called Reference Knowledge Stack (RKS). It is argued that RKS is an important step in the efforts of the Linked Open Data (LOD) project to transform the Web into a global data space with diverse real data, available for review and analysis. RKS is intended to make the interoperability between published datasets much more efficient than it is now. The approach discussed in the chapter consists of developing reference layers of upper-level ontologies by mapping them to certain LOD schemata and assigning instance data to them so they cover a reasonable portion of the LOD datasets. The chapter presents the methods (manual and semi-automatic) used in the creation of the RKS and gives examples that illustrate its advantages for managing highly heterogeneous data and its usefulness in real life knowledge intense applications.

Download Full-text

Enabling Web-scale data integration in biomedicine through Linked Open Data

npj Digital Medicine ◽

10.1038/s41746-019-0162-5 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 3

Author(s):

Maulik R. Kamdar ◽

Javier D. Fernández ◽

Axel Polleres ◽

Tania Tudorache ◽

Mark A. Musen

Keyword(s):

Data Integration ◽

Biomedical Research ◽

Semantic Processing ◽

Open Data ◽

Heterogeneous Data ◽

Research Community ◽

Linked Open Data ◽

Biomedical Data ◽

Semantic Web Technologies ◽

The Web

Abstract The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.

Download Full-text

Applications of Semantic Web in integrating open data and bibliographic records: a development example of an infomediary of Taiwanese indigenous people

The Electronic Library ◽

10.1108/el-09-2020-0258 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Han-Yu Sung ◽

Yu-Liang Chi

Keyword(s):

Semantic Web ◽

Data Integration ◽

Indigenous People ◽

Open Data ◽

Heterogeneous Data ◽

Data Sources ◽

Content Type ◽

Bibliographic Records ◽

Open Government Data ◽

The Web

Purpose This study aims to develop a Web-based application system called Infomediary of Taiwanese Indigenous Peoples (ITIP) that can help individuals comprehend the society and culture of indigenous people. The ITIP is based on the use of Semantic Web technologies to integrate a number of data sources, particularly including the bibliographic records of a museum. Moreover, an ontology model was developed to help users search cultural collections by topic concepts. Design/methodology/approach Two issues were identified that needed to be addressed: the integration of heterogeneous data sources and semantic-based information retrieval. Two corresponding methods were proposed: SPARQL federated queries were designed for data integration across the Web and ontology-driven queries were designed to semantically search by knowledge inference. Furthermore, to help users perform searches easily, three searching interfaces, namely, ethnicity, region and topic, were developed to take full advantage of the content available on the Web. Findings Most open government data provides structured but non-resource description framework data, Semantic Web consumers, therefore, require additional data conversion before the data can be used. On the other hand, although the library, archive and museum (LAM) community has produced some emerging linked data, very few data sets are released to the general public as open data. The Semantic Web’s vision of “web of data” remains challenging. Originality/value This study developed data integration from various institutions, including those of the LAM community. The development was conducted based on the mode of non-institution members (i.e. institutional outsiders). The challenges encountered included uncertain data quality and the absence of institutional participation.

Download Full-text

Mining the Web of Life Sciences Linked Open Data for Mechanism-Based Pharmacovigilance

Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18 ◽

10.1145/3184558.3186576 ◽

2018 ◽

Author(s):

Maulik R. Kamdar

Keyword(s):

Life Sciences ◽

Open Data ◽

Linked Open Data ◽

The Web

Download Full-text

Discovering and linking with life sciences linked open data cloud

Proceedings of the Symposium on Applied Computing - SAC '17 ◽

10.1145/3019612.3019933 ◽

2017 ◽

Author(s):

Muntazir Mehdi

Keyword(s):

Life Sciences ◽

Open Data ◽

Linked Open Data

Download Full-text

The read–write Linked Data Web

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2012.0513 ◽

2013 ◽

Vol 371 (1987) ◽

pp. 20120513 ◽

Cited By ~ 15

Author(s):

Tim Berners-Lee ◽

Kieron O’Hara

Keyword(s):

Future Development ◽

Linked Data ◽

Open Data ◽

Linked Open Data ◽

The Future ◽

The Web

This paper discusses issues that will affect the future development of the Web, either increasing its power and utility, or alternatively suppressing its development. It argues for the importance of the continued development of the Linked Data Web, and describes the use of linked open data as an important component of that. Second, the paper defends the Web as a read–write medium, and goes on to consider how the read–write Linked Data Web could be achieved.

Download Full-text

Publishing Statistical Data following the Linked Open Data Principles

Cases on Open-Linked Data and Semantic Web Applications ◽

10.4018/978-1-4666-2827-4.ch011 ◽

2013 ◽

pp. 199-226 ◽

Cited By ~ 5

Author(s):

Jose María Alvarez Rodríguez ◽

Jules Clement ◽

José Emilio Labra Gayo ◽

Hania Farhan ◽

Patricia Ordoñez de Pablos

Keyword(s):

Linked Data ◽

Statistical Data ◽

Open Data ◽

Linked Open Data ◽

Dimensional Measure ◽

The Web

This chapter introduces the promotion of statistical data to the Linked Open Data initiative in the context of the Web Index project. A framework for the publication of raw statistics and a method to convert them to Linked Data are also presented following the W3C standards RDF, SKOS, and OWL. This case study is focused on the Web Index project; launched by the Web Foundation, the Index is the first multi-dimensional measure of the growth, utility, and impact of the Web on people and nations. Finally, an evaluation of the advantages of using Linked Data to publish statistics is also presented in conjunction with a discussion and future steps sections.

Download Full-text

Web Retrieval of XML Documents

Web-Enabled Systems Integration ◽

10.4018/978-1-59140-041-7.ch009 ◽

2011 ◽

pp. 170-199

Author(s):

Barbara Catania ◽

Elena Ferrari

Keyword(s):

Expressive Power ◽

Data Representation ◽

Query Languages ◽

Heterogeneous Data ◽

Data Sources ◽

Xml Data ◽

Web Documents ◽

Web Retrieval ◽

Heterogeneous Data Sources ◽

The Web

Web is characterized by a huge amount of very heterogeneous data sources, that differ both in media support and format representation. In this scenario, there is the need of an integrating approach for querying heterogeneous Web documents. To this purpose, XML can play an important role since it is becoming a standard for data representation and exchange over the Web. Due to its flexibility, XML is currently being used as an interface language over the Web, by which (part of) document sources are represented and exported. Under this assumption, the problem of querying heterogeneous sources can be reduced to the problem of querying XML data sources. In this chapter, we first survey the most relevant query languages for XML data proposed both by the scientific community and by standardization committees, e.g., W3C, mainly focusing on their expressive power. Then, we investigate how typical Information Retrieval concepts, such as ranking, similarity-based search, and profile-based search, can be applied to XML query languages. Commercial products based on the considered approaches are then briefly surveyed. Finally, we conclude the chapter by providing an overview of the most promising research trends in the fields.

Download Full-text

Design and Implementation of Oilfield Heterogeneous Data Integration Model Based on Ontology

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.912-914.1201 ◽

2014 ◽

Vol 912-914 ◽

pp. 1201-1204

Author(s):

Gang Huang ◽

Xiu Ying Wu ◽

Man Yuan

Keyword(s):

Data Integration ◽

Heterogeneous Data ◽

Data Sources ◽

Semantic Heterogeneity ◽

Integration Model ◽

Integration Framework ◽

Heterogeneous Data Integration ◽

Semantic Level ◽

Heterogeneous Data Sources ◽

Semantic Difference

This paper provides an ontology-based distributed heterogeneous data integration framework (ODHDIF). The framework resolves the problem of semantic interoperability between heterogeneous data sources in semantic level. By metadatas specifying the distributed, heterogeneous data and by describing semantic information of data source , having "ontology" as a common semantic model, semantic match is established through ontology mapping between heterogeneous data sources and semantic difference institutions are shielded, so that semantic heterogeneity problem of the heterogeneous data sources can be effectively solved. It provides an effective technology measure for the interior information of enterprises to be shared in time accurately.

Download Full-text