scholarly journals A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 1822 ◽  
Author(s):  
Ana Claudia Sima ◽  
Christophe Dessimoz ◽  
Kurt Stockinger ◽  
Monique Zahn-Zabal ◽  
Tarcisio Mendes de Farias

The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the SPARQL query language. In this article, we provide a hands-on introduction to querying evolutionary data across multiple sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different sources can be compared, through the use of federated SPARQL queries.

F1000Research ◽  
2020 ◽  
Vol 8 ◽  
pp. 1822
Author(s):  
Ana Claudia Sima ◽  
Christophe Dessimoz ◽  
Kurt Stockinger ◽  
Monique Zahn-Zabal ◽  
Tarcisio Mendes de Farias

The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple data sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the equivalent SPARQL constructs required to benefit from this data – in particular, recursive property paths. In this article, we provide a hands-on introduction to querying evolutionary data across several data sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different data sources can be compared, through the use of federated SPARQL queries.


Author(s):  
Seán O’Riain ◽  
Andreas Harth ◽  
Edward Curry

With increased dependence on efficient use and inclusion of diverse corporate and Web based data sources for business information analysis, financial information providers will increasingly need agile information integration capabilities. Linked Data is a set of technologies and best practices that provide such a level of agility for information integration, access, and use. Current approaches struggle to cope with multiple data sources inclusion in near real-time, and have looked to Semantic Web technologies for assistance with infrastructure access, and dealing with multiple data formats and their vocabularies. This chapter discusses the challenges of financial data integration, provides the component architecture of Web enabled financial data integration and outlines the emergence of a financial ecosystem, based upon existing Web standards usage. Introductions to Semantic Web technologies are given, and the chapter supports this with insight and discussion gathered from multiple financial services use case implementations. Finally, best practice for integrating Web data based on the Linked Data principles and emergent areas are described.


2019 ◽  
pp. 249-257
Author(s):  
Yassine Laadidi ◽  
Mohamed Bahaj

The evolution of web technologies and the data we are manipulating announce profound changes on Business Intelligence (BI) systems and open up important researches and innovations particularly in multidimensional data modeling and data integration. The emergence of the semantic Web highlights the need of including external data sources in the BI system. The semantic web came with Resource Description Framework (RDF) model to describe data over the Web by annotating resources with semantics and properties and consequently establishing reasoning mechanisms. However, integrating and/or analyzing information from Wide World Sources still a very challenging process because of their “unpredictability” and heterogeneity. Consequently, the transition to an open BI/SW system is required to handle automatic alteration on structures and enabling discovery of multidimensional entities over multiple Web sources. In this paper, we introduce our prospective approach and architecture for including external data sources in an open BI/SW system and we provide an automatic method aimed to define multidimensional entities and properties over different sources for data acquisition and data analysis requests.


2008 ◽  
pp. 3309-3320
Author(s):  
Csilla Farkas

This chapter investigates the threat of unwanted Semantic Web inferences. We survey the current efforts to detect and remove unwanted inferences, identify research gaps, and recommend future research directions. We begin with a brief overview of Semantic Web technologies and reasoning methods, followed by a description of the inference problem in traditional databases. In the context of the Semantic Web, we study two types of inferences: (1) entailments defined by the formal semantics of the Resource Description Framework (RDF) and the RDF Schema (RDFS) and (2) inferences supported by semantic languages like the Web Ontology Language (OWL). We compare the Semantic Web inferences to the inferences studied in traditional databases. We show that the inference problem exists on the Semantic Web and that existing security methods do not fully prevent indirect data disclosure via inference channels.


2020 ◽  
Vol 1 (1) ◽  
pp. 428-444 ◽  
Author(s):  
Silvio Peroni ◽  
David Shotton

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open citation data as Linked Open Data using Semantic Web technologies, thereby providing a disruptive alternative to traditional proprietary citation indexes. Open citation data are valuable for bibliometric analysis, increasing the reproducibility of large-scale analyses by enabling publication of the source data. Following brief introductions to the development and benefits of open scholarship and to Semantic Web technologies, this paper describes OpenCitations and its data sets, tools, services, and activities. These include the OpenCitations Data Model; the SPAR (Semantic Publishing and Referencing) Ontologies; OpenCitations’ open software of generic applicability for searching, browsing, and providing REST APIs over resource description framework (RDF) triplestores; Open Citation Identifiers (OCIs) and the OpenCitations OCI Resolution Service; the OpenCitations Corpus (OCC), a database of open downloadable bibliographic and citation data made available in RDF under a Creative Commons public domain dedication; and the OpenCitations Indexes of open citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref Open DOI-to-DOI Citations, which currently contains over 624 million bibliographic citations and is receiving considerable usage by the scholarly community.


2018 ◽  
Vol 10 (8) ◽  
pp. 2613
Author(s):  
Dandan He ◽  
Zhongfu Li ◽  
Chunlin Wu ◽  
Xin Ning

Industrialized construction has raised the requirements of procurement methods used in the construction industry. The rapid development of e-commerce offers efficient and effective solutions, however the large number of participants in the construction industry means that the data involved are complex, and problems arise related to volume, heterogeneity, and fragmentation. Thus, the sector lags behind others in the adoption of e-commerce. In particular, data integration has become a barrier preventing further development. Traditional e-commerce platform, which considered data integration for common product data, cannot meet the requirements of construction product data integration. This study aimed to build an information-integrated e-commerce platform for industrialized construction procurement (ICP) to overcome some of the shortcomings existing platforms. We proposed a platform based on Building Information Modelling (BIM) and linked data, taking an innovative approach to data integration. It uses industrialized construction technology to support product standardization, BIM to support procurement process, and linked data to connect different data sources. The platform was validated using a case study. With the development of an e-commerce ontology, industrialized construction component information was extracted from BIM models and converted to Resource Description Framework (RDF) format. Related information from different data sources was also converted to RDF format, and Simple Protocol and Resource Description Framework Query Language (SPARQL) queries were implemented. The platform provides a solution for the development of e-commerce platform in the construction industry.


2010 ◽  
Vol 04 (04) ◽  
pp. 423-451 ◽  
Author(s):  
SUNITHA RAMANUJAM ◽  
VAIBHAV KHADILKAR ◽  
LATIFUR KHAN ◽  
MURAT KANTARCIOGLU ◽  
BHAVANI THURAISINGHAM ◽  
...  

The current buzzword in the Internet community is the Semantic Web initiative proposed by the W3C to yield a Web that is more flexible and self-adapting. However, for the Semantic Web initiative to become a reality, heterogeneous data sources need to be integrated in order to enable access to them in a homogeneous manner. Since a vast majority of data currently resides in relational databases, integrating relational data sources with semantic web technologies is at the top of the list of activities required to realize the semantic web vision. Several efforts exist that publish relational data as Resource Description Framework (RDF) triples; however almost all current work in this arena is uni-directional, presenting data from an underlying relational database into a corresponding virtual RDF store in a read-only manner. An enhancement over previous relational-to-RDF bridging work in the form of bi-directionality support is presented in this paper. The bi-directional bridge proposed here allows RDF data updates specified as triples to be propagated back into the underlying relational database as tuples. Towards this end, we present various algorithms to translate the triples to be updated/inserted/deleted into equivalent relational attributes/tuples whenever possible. Particular emphasis is laid, in this paper, on the translation and update propagation process for triples containing blank nodes and reification nodes, and a platform enhanced with our algorithms, called D2RQ++, through which bi-directional translation can be achieved, is presented.


Author(s):  
Aatif Ahmad Khan ◽  
Sanjay Kumar Malik

Semantic Search refers to set of approaches dealing with usage of Semantic Web technologies for information retrieval in order to make the process machine understandable and fetch precise results. Knowledge Bases (KB) act as the backbone for semantic search approaches to provide machine interpretable information for query processing and retrieval of results. These KB include Resource Description Framework (RDF) datasets and populated ontologies. In this paper, an assessment of the largest cross-domain KB is presented that are exploited in large scale semantic search and are freely available on Linked Open Data Cloud. Analysis of these datasets is a prerequisite for modeling effective semantic search approaches because of their suitability for particular applications. Only the large scale, cross-domain datasets are considered, which are having sizes more than 10 million RDF triples. Survey of sizes of the datasets in triples count has been depicted along with triples data format(s) supported by them, which is quite significant to develop effective semantic search models.


Sign in / Sign up

Export Citation Format

Share Document