IDSM ChemWebRDF: SPARQLing small-molecule datasets

AbstractThe Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.

Download Full-text

A Three Pillar Strategy for a fully agnostic semantic framework enabling sustainable clinical data interoperability (Preprint)

10.2196/preprints.27591 ◽

2021 ◽

Author(s):

Christophe Gaudet-Blavignac ◽

Jean Louis Raisaro ◽

Vasundra Touré ◽

Sabine Österle ◽

Katrin Crameri ◽

...

Keyword(s):

Data Storage ◽

Query Language ◽

Healthcare Research ◽

Data Models ◽

Relational Data ◽

University Hospitals ◽

Data Interoperability ◽

Innovative Strategy ◽

Semantic Framework ◽

Long Run

BACKGROUND Interoperability is a well-known challenge in medical informatics. Trends in interoperability can be organized in three streams: knowledge, data and process. The relevant standards can be classified into three levels of complexity – serial, organization and meta-organization. Despite many initiatives and organizations, the interoperability challenge has not yet been resolved, and calls for innovative ways of addressing current deficiencies are still frequently heard. OBJECTIVE In this paper, we describe an innovative strategy for data interoperability, based on three distinct but complementary pillars, and its implementation within the framework of the Swiss Personalized Health Network (SPHN). METHODS Our strategy involves an ontology- and model-agnostic approach based on three pillars: (i) an integrative and usability-focused semantic framework that includes a set of concept definitions and terminology bindings to existing controlled vocabularies in accordance with the context of use, (ii) a description-based formalism for model-agnostic data storage and transport, and (iii) a purpose-specific transformation into specific data models depending on the use case. RESULTS The proposed interoperability strategy has been implemented in the context of the SPHN initiative to serve the data-sharing needs of multicenter research projects. The semantic framework was created by a dedicated working group, comprising data and semantics experts from Swiss university hospitals, in collaboration with the SPHN Data Coordination Center (DCC). The framework consists of a list of semantic concepts that capture the information required by each project and are mapped to international terminologies based on purpose of use. The Resource Description Framework (RDF) was used to formally describe the concepts and their relationships to a common schema that is used to generate instances for storing and transporting hospitals’ data. Finally, data transformers based on the SPARQL query language have been implemented to convert data from the RDF representation into various tabular representations (including common relational data models) that can be processed by analytics software for downstream analysis. CONCLUSIONS The proposed strategy is not meant to replace existing standards, but to use them in a synergistic, purpose-specific way in order to enable semantic interoperability of health data at the interface between the healthcare, research and regulatory communities. Its wide adoption within the SPHN framework would lay the foundations for the creation of FAIR (findable, accessible, interoperable, reusable) graph-based data endpoints at Swiss university hospitals which could, in the long run, replace legacy data warehousing solutions based on relational data models, thus facilitating multicenter research for precision medicine in Switzerland and abroad.

Download Full-text

Storing massive Resource Description Framework (RDF) data: a survey

The Knowledge Engineering Review ◽

10.1017/s0269888916000217 ◽

2016 ◽

Vol 31 (4) ◽

pp. 391-413 ◽

Cited By ~ 21

Author(s):

Zongmin Ma ◽

Miriam A. M. Capretz ◽

Li Yan

Keyword(s):

Data Storage ◽

Resource Description Framework ◽

Relational Databases ◽

Query Language ◽

Current State ◽

Rdf Data ◽

Description Framework ◽

Resource Description ◽

Flexible Model ◽

The Web

AbstractThe Resource Description Framework (RDF) is a flexible model for representing information about resources on the Web. As a W3C (World Wide Web Consortium) Recommendation, RDF has rapidly gained popularity. With the widespread acceptance of RDF on the Web and in the enterprise, a huge amount of RDF data is being proliferated and becoming available. Efficient and scalable management of RDF data is therefore of increasing importance. RDF data management has attracted attention in the database and Semantic Web communities. Much work has been devoted to proposing different solutions to store RDF data efficiently. This paper focusses on using relational databases and NoSQL (for ‘not only SQL (Structured Query Language)’) databases to store massive RDF data. A full up-to-date overview of the current state of the art in RDF data storage is provided in the paper.

Download Full-text

Creating RESTful APIs over SPARQL endpoints using RAMOSE

Semantic Web ◽

10.3233/sw-210439 ◽

2021 ◽

pp. 1-19

Author(s):

Marilena Daquino ◽

Ivan Heibi ◽

Silvio Peroni ◽

David Shotton

Keyword(s):

Semantic Web ◽

Service Providers ◽

Query Language ◽

Sparql Query ◽

Direct Access ◽

Technical Solution ◽

Semantic Web Technologies ◽

Web Technologies ◽

Sparql Endpoint ◽

Rdf Data

Semantic Web technologies are widely used for storing RDF data and making them available on the Web through SPARQL endpoints, queryable using the SPARQL query language. While the use of SPARQL endpoints is strongly supported by Semantic Web experts, it hinders broader use of RDF data by common Web users, engineers and developers unfamiliar with Semantic Web technologies, who normally rely on Web RESTful APIs for querying Web-available data and creating applications over them. To solve this problem, we have developed RAMOSE, a generic tool developed in Python to create REST APIs over SPARQL endpoints. Through the creation of source-specific textual configuration files, RAMOSE enables the querying of SPARQL endpoints via simple Web RESTful API calls that return either JSON or CSV-formatted data, thus hiding all the intrinsic complexities of SPARQL and RDF from common Web users. We provide evidence that the use of RAMOSE to provide REST API access to RDF data within OpenCitations triplestores is beneficial in terms of the number of queries made by external users of such RDF data using the RAMOSE API, compared with the direct access via the SPARQL endpoint. Our findings show the importance for suppliers of RDF data of having an alternative API access service, which enables its use by those with no (or little) experience in Semantic Web technologies and the SPARQL query language. RAMOSE can be used both to query any SPARQL endpoint and to query any other Web API, and thus it represents an easy generic technical solution for service providers who wish to create an API service to access Linked Data stored as RDF in a triplestore.

Download Full-text

Mobile Software Assurance Informed through Knowledge Graph Construction: The OWASP Threat of Insecure Data Storage

Journal of Computer Science Research ◽

10.30564/jcsr.v2i2.1765 ◽

2020 ◽

Vol 2 (2) ◽

Author(s):

Suzanna Schmeelk ◽

Lixin Tao

Keyword(s):

Data Storage ◽

Program Analysis ◽

Web Application ◽

Security Analysis ◽

Knowledge Graph ◽

Healthcare Applications ◽

Sensitive Data ◽

Knowledge Graphs ◽

Mobile Malware Detection ◽

Software Assurance

Many organizations, to save costs, are movinheg to t Bring Your Own Mobile Device (BYOD) model and adopting applications built by third-parties at an unprecedented rate. Our research examines software assurance methodologies specifically focusing on security analysis coverage of the program analysis for mobile malware detection, mitigation, and prevention. This research focuses on secure software development of Android applications by developing knowledge graphs for threats reported by the Open Web Application Security Project (OWASP). OWASP maintains lists of the top ten security threats to web and mobile applications. We develop knowledge graphs based on the two most recent top ten threat years and show how the knowledge graph relationships can be discovered in mobile application source code. We analyze 200+ healthcare applications from GitHub to gain an understanding of their software assurance of their developed software for one of the OWASP top ten moble threats, the threat of “Insecure Data Storage.” We find that many of the applications are storing personally identifying information (PII) in potentially vulnerable places leaving users exposed to higher risks for the loss of their sensitive data.

Download Full-text

Performance Benchmarking of Key-Value Store NoSQL Databases

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i6.pp5333-5341 ◽

2018 ◽

Vol 8 (6) ◽

pp. 5333

Author(s):

Omoruyi Osemwegie ◽

Kennedy Okokpujie ◽

Nsikan Nkordeh ◽

Charles Ndujiuba ◽

Samuel John ◽

...

Keyword(s):

Data Storage ◽

Web Applications ◽

Query Language ◽

Research Work ◽

Database Systems ◽

Nosql Databases ◽

Performance Benchmarking ◽

Nosql Database ◽

Structured Query Language ◽

Web Developers

<p>Increasing requirements for scalability and elasticity of data storage for web applications has made Not Structured Query Language NoSQL databases more invaluable to web developers. One of such NoSQL Database solutions is Redis. A budding alternative to Redis database is the SSDB database, which is also a key-value store but is disk-based. The aim of this research work is to benchmark both databases (Redis and SSDB) using the Yahoo Cloud Serving Benchmark (YCSB). YCSB is a platform that has been used to compare and benchmark similar NoSQL database systems. Both databases were given variable workloads to identify the throughput of all given operations. The results obtained shows that SSDB gives a better throughput for majority of operations to Redis’s performance.</p>

Download Full-text

Malaria Dashboard: a digital platform for epidemiological data from India (Preprint)

10.2196/preprints.35942 ◽

2021 ◽

Author(s):

Chander Prakash Yadav ◽

Amit Sharma

Keyword(s):

Web Application ◽

Epidemiological Data ◽

Web Page ◽

Digital Tool ◽

R Software ◽

Malaria Epidemiology ◽

Interactive Analysis ◽

Data Source ◽

R Packages ◽

Excel File

BACKGROUND A digital dashboard on malaria epidemiological data will be an invaluable resource for the research community and the planning of malaria control. OBJECTIVE To develop a digital Malaria Dashboard (MDB) for malaria epidemiological data METHODS We have developed a digital Malaria Dashboard (MDB) using the R software. A total of thirteen different R packages were used in this process, within which shiny and ggplot2 were used more intensively. The MDB is a web application that can work online as well as offline. Presently it is available in offline mode only. The MS Excel file may be used as an input data source and any personal computer may be used for this application. RESULTS The MDB is a highly versatile interface that allows prompt and interactive analysis of malaria epidemiological data. The primary interface of MDB is like a web page that has 14 tabs (or pages), some more tabs may be added or deleted as per requirement and each tab corresponds to a particular analysis. A user may move from one tab to another via tab icons. Each tab thus allows flexibility in correlating various parameters like SPR, API, AFI, ABER, RT, malaria cases, death due to malaria, BSC, and BSE. The data can be analyzed in required granularity (national, state, district), and its enhanced visualization allows for facile usage. Using the MDB, one can quickly assess national or more granular scenarios in a time series manner and then compare the malaria epidemiology in various states and their constituent districts. CONCLUSIONS This MDB is a highly effective digital tool for studying the malaria situation and strategizing for malaria elimination and researcher may use it as a prototype for developing some other dashboards in their own fields.

Download Full-text

GuessXQ

Innovations in XML Applications and Metadata Management ◽

10.4018/978-1-4666-2669-0.ch004 ◽

2013 ◽

pp. 57-76

Author(s):

Daniela Morais Fonte ◽

Daniela da Cruz ◽

Pedro Rangel Henriques ◽

Alda Lopes Gancarski

Keyword(s):

Information Retrieval ◽

Learning Process ◽

Web Application ◽

Relational Databases ◽

Query Language ◽

General Purpose ◽

Considerable Effort ◽

Document Structure ◽

Query By Example ◽

Xml Documents

XML is a widely used general-purpose annotation formalism for creating custom markup languages. XML annotations give structure to plain documents to interpret their content. To extract information from XML documents XPath and XQuery languages can be used. However, the learning of these dialects requires a considerable effort. In this context, the traditional Query-By-Example methodology (for Relational Databases) can be an important contribution to leverage this learning process, freeing the user from knowing the specific query language details or even the document structure. This chapter describes how to apply the Query-By-Example concept in a Web-application for information retrieval from XML documents, the GuessXQ system. This engine is capable of deducing, from an example, the respective XQuery statement. The example consists of marking the desired components directly on a sample document, picked-up from a collection. After inferring the corresponding query, GuessXQ applies it to the collection to obtain the desired result.

Download Full-text

Different Perspectives of Cloud Security

Cloud Technology ◽

10.4018/978-1-4666-6539-2.ch066 ◽

2015 ◽

pp. 1432-1449

Author(s):

M. Sundaresan ◽

D. Boopathy

Keyword(s):

Cloud Computing ◽

Data Storage ◽

Service Provider ◽

Web Application ◽

Cloud Storage ◽

Minimum Cost ◽

Maximum Level ◽

Local Data ◽

Legal Security ◽

Processing And Storage

Cloud storage systems can be considered to be a network of distributed datacenters that typically use cloud computing technology like virtualization and offer some kind of interface for storing data. To increase the availability of the data, it may be redundantly stored at different locations. Basic cloud storage is generally not designed to be accessed directly by users but rather incorporated into custom software using API. Cloud computing involves other processes besides storage. In this chapter, the authors discuss different viewpoints for cloud computing from the user, legal, security, and service provider perspectives. From the user viewpoint, the stored data creates a mirror of currently available local data. The backup feature allows users to recover any version of a previously stored data. Synchronization is the process of establishing consistency among the stored data. From the legal viewpoint, provisions regulating the user processing and storage of the data must have to be constant from when the data is stored in the cloud. The security viewpoint requires interaction with the Web application, data storage, and transmission. The service provider viewpoint requires the maximum level of cloud storage service at the minimum cost.

Download Full-text

Towards Massive RDF Storage in NoSQL Databases

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch013 ◽

2019 ◽

pp. 263-284 ◽

Cited By ~ 2

Author(s):

Zongmin Ma ◽

Li Yan

Keyword(s):

Data Storage ◽

Large Scale ◽

Future Research ◽

Nosql Databases ◽

Current State ◽

Data Store ◽

Rdf Data ◽

Description Framework ◽

Resource Description ◽

The Web

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.

Download Full-text

A Review of RDF Storage in NoSQL Databases

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch009 ◽

2016 ◽

pp. 210-229 ◽

Cited By ~ 2

Author(s):

Zongmin Ma ◽

Li Yan

Keyword(s):

Data Storage ◽

Large Scale ◽

Future Research ◽

Nosql Databases ◽

Current State ◽

Data Store ◽

Rdf Data ◽

Description Framework ◽

Resource Description ◽

The Web

The Resource Description Framework (RDF) is a model for representing information resources on the Web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the Web, a huge amount of RDF data is being proliferated and becoming available. So RDF data management is of increasing importance, and has attracted attentions in the database community as well as the Semantic Web community. Currently much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (“not only SQL”) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.

Download Full-text