Querying Rich Ontologies by Exploiting the Structure of Data

Abstract Ontology-based data access (OBDA) has emerged as a paradigm for accessing heterogeneous and incomplete data sources. A fundamental reasoning service in OBDA, the ontology mediated query (OMQ) answering has received much attention from the research community. However, there exists a disparity in research carried for OMQ algorithms for lightweight DLs which have found their way into practical implementations, and algorithms for expressive DLs for which the work has had mainly theoretical oriented goals. In the dissertation, a technique that leverages the structural properties of data to help alleviate the problems that typically arise when answering the queries in expressive settings is developed. In this paper, a brief summary of the technique along with the different algorithms developed for OMQ for expressive DLs is given.

Download Full-text

A Semantic Framework to Improve Interoperability of Malaria Surveillance Systems

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v10i1.8987 ◽

2018 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Jon Hael Simon Brenas ◽

Mohammad S. Al-Manir ◽

Kate Zinszer ◽

Christopher J. Baker ◽

Arash Shaban-Nejad

Keyword(s):

Semantic Web ◽

Insecticide Resistance ◽

Data Access ◽

Data Sources ◽

Surveillance Systems ◽

Malaria Surveillance ◽

Semantic Framework ◽

Ontology Language ◽

The Impact ◽

Target Data

ObjectiveMalaria is one of the top causes of death in Africa and some other regions in the world. Data driven surveillance activities are essential for enabling the timely interventions to alleviate the impact of the disease and eventually eliminate malaria. Improving the interoperability of data sources through the use of shared semantics is a key consideration when designing surveillance systems, which must be robust in the face of dynamic changes to one or more components of a distributed infrastructure. Here we introduce a semantic framework to improve interoperability of malaria surveillance systems (SIEMA).IntroductionIn 2015, there were 212 million new cases of malaria, and about 429,000 malaria death, worldwide. African countries accounted for almost 90% of global cases of malaria and 92% of malaria deaths. Currently, malaria data are scattered across different countries, laboratories, and organizations in different heterogeneous data formats and repositories. The diversity of access methodologies makes it difficult to retrieve relevant data in a timely manner. Moreover, lack of rich metadata limits the reusability of data and its integration. The current process of discovering, accessing and reusing the data is inefficient and error-prone profoundly hindering surveillance efforts.As our knowledge about malaria and appropriate preventive measures becomes more comprehensive malaria data management systems, data collection standards, and data stewardship are certain to change regularly. Collectively these changes will make it more difficult to perform accurate data analytics or achieve reliable estimates of important metrics, such as infection rates. Consequently, there is a critical need to rapidly re-assess the integrity of data and knowledge infrastructures that experts depend on to support their surveillance tasks.MethodsIn order to address the challenge of heterogeneity of malaria data sources we recruit domain specific ontologies in the field (e.g. IDOMAL (1)) that define a shared lexicon of concepts and relations. These ontologies are expressed in the standard Web Ontology Language (OWL).To over come challenges in accessing distributed data resources we have adopted the Semantic Automatic Discovery & Integration framework (SADI) (2) to ensure interoperability. SADI provides a way to describe services that provide access to data, detailing inputs and outputs of services and a functional description. Existing ontology terms are used when building SADI Service descriptions. The services can be discovered by querying a registry and combined into complex workflows. Users can issue SPARQL syntax to a query engine which can plan complex workflows to fetch actual data, without having to know how target data is structured or where it is located.In order to tackle changes in target data sources, the ontologies or the service definitions, we create a Dashboard (3) that can report any changes. The Dashboard reuses some existing tools to perform a series of checks. These tools compare versions of ontologies and databases allowing the Dashboard to report these changes. Once a change has been identified, as series of recommendations can be made, e.g. services can be retired or updated so that data access can continue.ResultsWe used the Mosquito Insecticide Resistance Ontology (MIRO) (5) to define the common lexicon for our data sources and queries. The sources we created are CSV files that use the IRbase (4) schema. With the data defined using we specified several SPARQL queries and the SADI services needed to answer them. These services were designed to enabled access to the data separated in different files using different formats. In order to showcase the capabilities of our Dashboard, we also modified parts of the service definitions, of the ontology and of the data sources. This allowed us to test our change detection capabilities. Once changes where detected, we manually updated the services to comply with a revised ontology and data sources and checked that the changes we proposed where yielding services that gave the right answers. In the future, we plan to make the updating of the services automatic.ConclusionsBeing able to make the relevant information accessible to a surveillance expert in a seamless way is critical in tackling and ultimately curing malaria. In order to achieve this, we used existing ontologies and semantic web services to increase the interoperability of the various sources. The data as well as the ontologies being likely to change frequently, we also designed a tool allowing us to detect and identify the changes and to update the services so that the whole surveillance systems becomes more resilient.References1. P. Topalis, E. Mitraka, V Dritsou, E. Dialynas and C. Louis, “IDOMAL: the malaria ontology revisited” in Journal of Biomedical Semantics, vol. 4, no. 1, p. 16, Sep 2013.2. M. D. Wilkinson, B. Vandervalk and L. McCarthy, “The Semantic Automated Discovery and Integration (SADI) web service design-pattern, API and reference implementation” in Journal of Biomedical Semantics, vol. 2, no. 1, p. 8, 2011.3. J.H. Brenas, M.S. Al-Manir, C.J.O. Baker and A. Shaban-Nejad, “Change management dashboard for the SIEMA global surveillance infrastructure”, in International Semantic Web Conference, 20174. E. Dialynas, P. Topalis, J. Vontas and C. Louis, "MIRO and IRbase: IT Tools for the Epidemiological Monitoring of Insecticide Resistance in Mosquito Disease Vectors", in PLOS Neglected Tropical Diseases 2009

Download Full-text

An Ontology-based approach to enable data-driven research in the field of NDT in Civil Engineering

10.5194/egusphere-egu21-12125 ◽

2021 ◽

Author(s):

Benjamin Moreno-Torres ◽

Christoph Völker ◽

Sabine Kruschwitz

Keyword(s):

Knowledge Representation ◽

Civil Engineering ◽

Common Knowledge ◽

Measurement Data ◽

Data Access ◽

Semantic Knowledge ◽

Data Sources ◽

Data Sets ◽

Extensive Literature ◽

Distributed Data

<div> <p>Non-destructive testing (NDT) data in civil engineering is regularly used for scientific analysis. However, there is no uniform representation of the data yet. An analysis of distributed data sets across different test objects is therefore too difficult in most cases.</p> <p>To overcome this, we present an approach for an integrated data management of distributed data sets based on Semantic Web technologies. The cornerstone of this approach is an ontology, a semantic knowledge representation of our domain. This NDT-CE ontology is later populated with the data sources. Using the properties and the relationships between concepts that the ontology contains, we make these data sets meaningful also for machines. Furthermore, the ontology can be used as a central interface for database access. Non-domain data sources can be integrated by linking them with the NDT ontology, making them directly available for generic use in terms of digitization. Based on an extensive literature research, we outline the possibilities that result for NDT in civil engineering, such as computer-aided sorting and analysis of measurement data, and the recognition and explanation of correlations.</p> <p>A common knowledge representation and data access allows the scientific exploitation of existing data sources with data-based methods (such as image recognition, measurement uncertainty calculations, factor analysis or material characterization) and simplifies bidirectional knowledge and data transfer between engineers and NDT specialists.</p> </div>

Download Full-text

Informational Data Mining

Enterprise Business Modeling, Optimization Techniques, and Flexible Information Systems ◽

10.4018/978-1-4666-3946-1.ch005 ◽

2013 ◽

pp. 58-65

Author(s):

Feyza Gürbüz ◽

Fatma Gökçe Önen

Keyword(s):

Data Mining ◽

Information Systems ◽

Knowledge Discovery ◽

Major Change ◽

Research Community ◽

Data Sources ◽

Accurate Information ◽

Rule Mining ◽

Data Mining Techniques ◽

Information Strategies

The previous decades have witnessed major change within the Information Systems (IS) environment with a corresponding emphasis on the importance of specifying timely and accurate information strategies. Currently, there is an increasing interest in data mining and information systems optimization. Therefore, it makes data mining for optimization of information systems a new and growing research community. This chapter surveys the application of data mining to optimization of information systems. These systems have different data sources and accordingly different objectives for knowledge discovery. After the preprocessing stage, data mining techniques can be applied on the suitable data for the objective of the information systems. These techniques are prediction, classification, association rule mining, statistics and visualization, clustering and outlier detection.

Download Full-text

Semantic-Based Geospatial Data Integration With Unique Features

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch012 ◽

2019 ◽

pp. 254-277 ◽

Cited By ~ 1

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Experimental Results ◽

Data Sources ◽

Data Format ◽

Access Protocols ◽

Data Source

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text

Semantic Web and Geospatial Unique Features Based Geospatial Data Integration

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch011 ◽

2019 ◽

pp. 230-253

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Semantic Web ◽

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Data Sources ◽

Modeling Process ◽

Translation Function ◽

Data Source

Since large amount of geospatial data are produced by various sources and stored in incompatible formats, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. First, we provide a uniform integration paradigm for users to retrieve geospatial data. Then, we align the retrieved geospatial data in the modeling process to eliminate heterogeneity with the help of Karma. Our main contribution focuses on addressing the third problem. Previous work has been done by defining a set of semantic rules for performing the linking process. However, the geospatial data has some specific geospatial relationships, which is significant for linking but cannot be solved by the Semantic Web techniques directly. We take advantage of such unique features about geospatial data to implement the linking process. In addition, the previous work will meet a complicated problem when the geospatial data sources are in different languages. In contrast, our proposed linking algorithms are endowed with translation function, which can save the translating cost among all the geospatial sources with different languages. Finally, the geospatial data is integrated by eliminating data redundancy and combining the complementary properties from the linked records. We mainly adopt four kinds of geospatial data sources, namely, OpenStreetMap(OSM), Wikmapia, USGS and EPA, to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text

Information Processing in Research Paper Recommender System Classes

Advances in Library and Information Science - Research Data Access and Management in Modern Libraries ◽

10.4018/978-1-5225-8437-7.ch005 ◽

2019 ◽

pp. 90-118

Author(s):

Benard M. Maake ◽

Sunday O. Ojo ◽

Tranos Zuva

Keyword(s):

Information Processing ◽

Recommender Systems ◽

Operational Research ◽

Data Access ◽

Research Paper ◽

Research Community ◽

The Internet ◽

Research Papers ◽

Art Research ◽

Research Paper Recommender Systems

Research-related publications and articles have flooded the internet, and researchers are in the quest of getting better tools and technologies to improve the recommendation of relevant research papers. Ever since the introduction of research paper recommender systems, more than 400 research paper recommendation related articles have been so far published. These articles describe the numerous tools, methodologies, and technologies used in recommending research papers, further highlighting issues that need the attention of the research community. Few operational research paper recommender systems have been developed though. The main objective of this review paper is to summaries the state-of-the-art research paper recommender systems classification categories. Findings and concepts on data access and manipulations in the field of research paper recommendation will be highlighted, summarized, and disseminated. This chapter will be centered on reviewing articles in the field of research paper recommender systems published from the early 1990s until 2017.

Download Full-text

Semantic-Based Geospatial Data Integration With Unique Features

Innovations, Developments, and Applications of Semantic Web and Information Systems - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-5042-6.ch015 ◽

2018 ◽

pp. 393-416

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Experimental Results ◽

Data Sources ◽

Data Format ◽

Access Protocols ◽

Data Source

Download Full-text

Assessment of the opportunities for increasing the availability of EU data on consumer product-related injuries

Injury Prevention ◽

10.1136/injuryprev-2020-043677 ◽

2020 ◽

pp. injuryprev-2020-043677

Author(s):

Anita Radovnikovic ◽

Otmar Geiss ◽

Stylianos Kephalopoulos ◽

Vittorio Reina ◽

Josefa Barrero ◽

...

Keyword(s):

Data Access ◽

Online News ◽

Consumer Products ◽

Product Safety ◽

Data Sources ◽

Consumer Product ◽

Insurance Companies ◽

Online Data ◽

The Public ◽

Wide Range

The availability of data on consumer products-related accidents and injuries is of interest to a wide range of stakeholders, such as consumer product safety and injury prevention policymakers, market surveillance authorities, consumer organisations, standardisation organisations, manufacturers and the public. While the amount of information available and potentially of use for product safety is considerable in some European Union (EU) countries, its usability at EU level is difficult due to high fragmentation of the data sources, the diversity of data collection methods and increasing data protection concerns. To satisfy the policy need for more timely information on consumer product-related incidents, apart from injury data that have been historically collected by the public health sector, a number of 'alternative' data sources were assessed as potential sources of interest. This study explores the opportunities for enhancing the availability of data of consumer product-related injuries, arising from selected existing and 'alternative' data sources, widely present in Europe, such as firefighters’ and poison centres’ records, mortality statistics, consumer complaints, insurance companies’ registers, manufacturers’ incident registers and online news sources. These data sources, coupled with the use of IT technologies, such as interlinking by remote data access, could fill in the existing information gap. Strengths and weaknesses of selected data sources, with a view to support a common data platform, are evaluated and presented. Conducting the study relied on the literature review, extensive use of the surveys, interviews, workshops with experts and online data-mining pilot study.

Download Full-text

Potential Data Sources for a New Study of Social Mobility in the United States

The Annals of the American Academy of Political and Social Science ◽

10.1177/0002716214552773 ◽

2014 ◽

Vol 657 (1) ◽

pp. 208-246

Author(s):

John Robert Warren

Keyword(s):

United States ◽

Social Mobility ◽

Population Survey ◽

Data Access ◽

The United States ◽

Community Survey ◽

Data Sources ◽

American Community Survey ◽

Costs And Benefits ◽

Trade Offs

In this article I define the main criteria that ought to be considered in evaluating the costs and benefits of various data resources that might be used for a new study of social and economic mobility in the United States. These criteria include population definition and coverage, sample size, topical coverage, temporal issues, spatial issues, sustainability, financial expense, and privacy and data access. I use these criteria to evaluate the strengths and weakness of several possible data resources for a new study of mobility, including existing smaller-scale surveys, the Current Population Survey, the American Community Survey, linked administrative data, and a new stand-alone survey. No option is perfect, and all involve trade-offs. I conclude by recommending five possible designs that are particularly strong on the criteria listed above.

Download Full-text

Integrating distributed data sources with OGSA–DAI DQP and V iews

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2010.0166 ◽

2010 ◽

Vol 368 (1926) ◽

pp. 4133-4145 ◽

Cited By ~ 18

Author(s):

Bartosz Dobrzelecki ◽

Amrey Krause ◽

Alastair C. Hume ◽

Alistair Grant ◽

Mario Antonioletti ◽

...

Keyword(s):

Data Integration ◽

Query Processing ◽

Data Access ◽

Data Sources ◽

Distributed Data ◽

Grid Services ◽

Distributed Query Processing ◽

Processing Resource ◽

Distributed Query ◽

Open Grid Services Architecture

OGSA-DAI (Open Grid Services Architecture Data Access and Integration) is a framework for building distributed data access and integration systems. Until recently, it lacked the built-in functionality that would allow easy creation of federations of distributed data sources. The latest release of the OGSA-DAI framework introduced the OGSA-DAI DQP (Distributed Query Processing) resource. The new resource encapsulates a distributed query processor, that is able to orchestrate distributed data sources when answering declarative user queries. The query processor has many extensibility points, making it easy to customize. We have also introduced a new OGSA-DAI V iews resource that provides a flexible method for defining views over relational data. The interoperability of the two new resources, together with the flexibility of the OGSA-DAI framework, allows the building of highly customized data integration solutions.

Download Full-text