scholarly journals Handling qualitative preferences in SPARQL over virtual ontology-based data access

Semantic Web ◽  
2022 ◽  
pp. 1-24
Author(s):  
Marlene Goncalves ◽  
David Chaves-Fraga ◽  
Oscar Corcho

With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to assign weights that may not be known beforehand to a scoring function. Unlike the quantitative approach, under the qualitative approach, which includes the well-known skyline, preference criteria are more intuitive in certain cases and can be expressed more naturally. In this paper, we address the problem of evaluating SPARQL qualitative preference queries over an Ontology-Based Data Access (OBDA) approach, which provides uniform access over multiple and heterogeneous data sources. Our main contribution is Morph-Skyline++, a framework for processing SPARQL qualitative preferences by directly querying relational databases. Our framework implements a technique that translates SPARQL qualitative preference queries directly into queries that can be evaluated by a relational database management system. We evaluate our approach over different scenarios, reporting the effects of data distribution, data size, and query complexity on the performance of our proposed technique in comparison with state-of-the-art techniques. Obtained results suggest that the execution time can be reduced by up to two orders of magnitude in comparison to current techniques scaling up to larger datasets while identifying precisely the result set.

2020 ◽  
Vol 9 (8) ◽  
pp. 474
Author(s):  
Linfang Ding ◽  
Guohui Xiao ◽  
Diego Calvanese ◽  
Liqiu Meng

In a variety of applications relying on geospatial data, getting insights into heterogeneous geodata sources is crucial for decision making, but often challenging. The reason is that it typically requires combining information coming from different sources via data integration techniques, and then making sense out of the combined data via sophisticated analysis methods. To address this challenge we rely on two well-established research areas: data integration and geovisual analytics, and propose to adopt an ontology-based approach to decouple the challenges of data access and analytics. Our framework consists of two modules centered around an ontology: (1) an ontology-based data integration (OBDI) module, in which mappings specify the relationship between the underlying data and a domain ontology; (2) a geovisual analytics (GeoVA) module, designed for the exploration of the integrated data, by explicitly making use of standard ontologies. In this framework, ontologies play a central role by providing a coherent view over the heterogeneous data, and by acting as a mediator for visual analysis tasks. We test our framework in a scenario for the investigation of the spatiotemporal patterns of meteorological and traffic data from several open data sources. Initial studies show that our approach is feasible for the exploration and understanding of heterogeneous geospatial data.


2016 ◽  
Author(s):  
Ninoshka K. Singh ◽  
Darrell O Ricke

AbstractMajor companies, healthcare professionals, the military, and other scientists and innovators are now sensing that fitness and health data from wearable biosensors will likely provide new discoveries and insights into physiological, cognitive, and emotional health status of an individual. Having the ability to collect, process, and correlate data simultaneously from a set of heterogonous biosensor sources may be a key factor in informing the development of new technologies for reducing health risks, improving health status, and possibly preventing and predicting disease. The challenge in achieving this is getting easy access to heterogeneous data from a set of disparate sensors in a single, integrated wearable monitoring system. Often times, the data recorded by commercial biosensing devices are contained within each manufacturer’s proprietary platform. Summary data is available for some devices as free downloads or included only in annual premium memberships. Access to raw measurements is generally unavailable, especially from a custom developed application that may include prototype biosensors. In this paper, we explore key ideas on how to leverage the design features of Bluetooth Low Energy to ease the integration of disparate biosensors at the sensor communication layer. This component is intended to fit into a larger, multi-layered, open data framework that can provide additional data management and analytics capabilities for consumers and scientists alike at all the layers of a data access model which is typically employed in a body sensor network system.


2021 ◽  
Vol 13 (5) ◽  
pp. 124
Author(s):  
Jiseong Son ◽  
Chul-Su Lim ◽  
Hyoung-Seop Shim ◽  
Ji-Sun Kang

Despite the development of various technologies and systems using artificial intelligence (AI) to solve problems related to disasters, difficult challenges are still being encountered. Data are the foundation to solving diverse disaster problems using AI, big data analysis, and so on. Therefore, we must focus on these various data. Disaster data depend on the domain by disaster type and include heterogeneous data and lack interoperability. In particular, in the case of open data related to disasters, there are several issues, where the source and format of data are different because various data are collected by different organizations. Moreover, the vocabularies used for each domain are inconsistent. This study proposes a knowledge graph to resolve the heterogeneity among various disaster data and provide interoperability among domains. Among disaster domains, we describe the knowledge graph for flooding disasters using Korean open datasets and cross-domain knowledge graphs. Furthermore, the proposed knowledge graph is used to assist, solve, and manage disaster problems.


1999 ◽  
Vol 33 (3) ◽  
pp. 55-66 ◽  
Author(s):  
L. Charles Sun

An interactive data access and retrieval system, developed at the U.S. National Oceanographic Data Genter (NODG) and available at <ext-link ext-link-type="uri" href="http://www.node.noaa.gov">http://www.node.noaa.gov</ext-link>, is presented in this paper. The purposes of this paper are: (1) to illustrate the procedures of quality control and loading oceanographic data into the NODG ocean databases and (2) to describe the development of a system to manage, visualize, and disseminate the NODG data holdings over the Internet. The objective of the system is to provide ease of access to data that will be required for data assimilation models. With advances in scientific understanding of the ocean dynamics, data assimilation models require the synthesis of data from a variety of resources. Modern intelligent data systems usually involve integrating distributed heterogeneous data and information sources. As the repository for oceanographic data, NOAA’s National Oceanographic Data Genter (NODG) is in a unique position to develop such a data system. In support of the data assimilation needs, NODG has developed a system to facilitate browsing of the oceanographic environmental data and information that is available on-line at NODG. Users may select oceanographic data based on geographic areas, time periods and measured parameters. Once the selection is complete, users may produce a station location plot, produce plots of the parameters or retrieve the data.


Author(s):  
Денис Валерьевич Сикулер

В статье выполнен обзор 10 ресурсов сети Интернет, позволяющих подобрать данные для разнообразных задач, связанных с машинным обучением и искусственным интеллектом. Рассмотрены как широко известные сайты (например, Kaggle, Registry of Open Data on AWS), так и менее популярные или узкоспециализированные ресурсы (к примеру, The Big Bad NLP Database, Common Crawl). Все ресурсы предоставляют бесплатный доступ к данным, в большинстве случаев для этого даже не требуется регистрация. Для каждого ресурса указаны характеристики и особенности, касающиеся поиска и получения наборов данных. В работе представлены следующие сайты: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Портал открытых данных Российской Федерации, World Bank, The Big Bad NLP Database, Common Crawl. The work presents review of 10 Internet resources that can be used to find data for different tasks related to machine learning and artificial intelligence. There were examined some popular sites (like Kaggle, Registry of Open Data on AWS) and some less known and specific ones (like The Big Bad NLP Database, Common Crawl). All included resources provide free access to data. Moreover in most cases registration is not needed for data access. Main features are specified for every examined resource, including regarding data search and access. The following sites are included in the review: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Open Data portal of the Russian Federation, World Bank, The Big Bad NLP Database, Common Crawl.


2019 ◽  
pp. 254-277 ◽  
Author(s):  
Ying Zhang ◽  
Chaopeng Li ◽  
Na Chen ◽  
Shaowen Liu ◽  
Liming Du ◽  
...  

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).


2019 ◽  
pp. 230-253
Author(s):  
Ying Zhang ◽  
Chaopeng Li ◽  
Na Chen ◽  
Shaowen Liu ◽  
Liming Du ◽  
...  

Since large amount of geospatial data are produced by various sources and stored in incompatible formats, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. First, we provide a uniform integration paradigm for users to retrieve geospatial data. Then, we align the retrieved geospatial data in the modeling process to eliminate heterogeneity with the help of Karma. Our main contribution focuses on addressing the third problem. Previous work has been done by defining a set of semantic rules for performing the linking process. However, the geospatial data has some specific geospatial relationships, which is significant for linking but cannot be solved by the Semantic Web techniques directly. We take advantage of such unique features about geospatial data to implement the linking process. In addition, the previous work will meet a complicated problem when the geospatial data sources are in different languages. In contrast, our proposed linking algorithms are endowed with translation function, which can save the translating cost among all the geospatial sources with different languages. Finally, the geospatial data is integrated by eliminating data redundancy and combining the complementary properties from the linked records. We mainly adopt four kinds of geospatial data sources, namely, OpenStreetMap(OSM), Wikmapia, USGS and EPA, to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).


Author(s):  
Ying Zhang ◽  
Chaopeng Li ◽  
Na Chen ◽  
Shaowen Liu ◽  
Liming Du ◽  
...  

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).


Author(s):  
Mariana Damova ◽  
Atanas Kiryakov ◽  
Maurice Grinberg ◽  
Michael K. Bergman ◽  
Frédérick Giasson ◽  
...  

The chapter introduces the process of design of two upper-level ontologies—PROTON and UMBEL—into reference ontologies and their integration in the so-called Reference Knowledge Stack (RKS). It is argued that RKS is an important step in the efforts of the Linked Open Data (LOD) project to transform the Web into a global data space with diverse real data, available for review and analysis. RKS is intended to make the interoperability between published datasets much more efficient than it is now. The approach discussed in the chapter consists of developing reference layers of upper-level ontologies by mapping them to certain LOD schemata and assigning instance data to them so they cover a reasonable portion of the LOD datasets. The chapter presents the methods (manual and semi-automatic) used in the creation of the RKS and gives examples that illustrate its advantages for managing highly heterogeneous data and its usefulness in real life knowledge intense applications.


Author(s):  
Roel During ◽  
Marcel Pleijte ◽  
Rosalie I. van Dam ◽  
Irini E. Salverda

Open data and citizen-led initiatives can be both friends and foes. Where it is available and ‘open', official data not only encourages increased public participation but can also generate the production and scrutiny of new material, potentially of benefit to the original provider and others, official or otherwise. In this way, official open data can be seen to improve democracy or, more accurately, the so-called ‘participative democracy'. On the other hand, the public is not always eager to share their personal information in the most open ways. Private and sometimes sensitive information however is required to initiate projects of societal benefit in difficult times. Many citizens appear content to channel personal information exchange via social media instead of putting it on public web sites. The perceived benefits from sharing and complete openness do not outweigh any disadvantages or fear of regulation. This is caused by various sources of contingency, such as the different appeals on citizens, construed in discourses on the participation society and the representative democracy, calling for social openness in the first and privacy protection in the latter. Moreover, the discourse on open data is an economic argument fighting the rules of privacy instead of the promotion of open data as one of the prerequisites for social action. Civil servants acknowledge that access to open data via all sorts of apps could contribute to the mushrooming of public initiatives, but are reluctant to release person-related sensitive information. The authors will describe and discuss this dilemma in the context of some recent case studies from the Netherlands concerning governmental programmes on open data and citizens' initiatives, to highlight both the governance constraints and uncertainties as well as citizens' concerns on data access and data sharing. It will be shown that openness has a different meaning and understanding in the participation society and representative democracy: i.e. the tension surrounding the sharing of private social information versus transparency. Looking from both sides at openness reveals double contingency: understanding and intentions on this openness invokes mutual enforcing uncertainties. This double contingency hampers citizens' eagerness to participate. The paper will conclude with a practical recommendation for improving data governance.


Sign in / Sign up

Export Citation Format

Share Document