Grace: An Efficient Parallel SPARQL Query System over Large-Scale RDF Data

Author(s):  
Xiang Kang ◽  
Yuying Zhao ◽  
Pingpeng Yuan ◽  
Hai Jin
2013 ◽  
Vol 441 ◽  
pp. 691-694
Author(s):  
Yi Qun Zeng ◽  
Jing Bin Wang

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.


2017 ◽  
Vol 44 (2) ◽  
pp. 203-229 ◽  
Author(s):  
Javier D Fernández ◽  
Miguel A Martínez-Prieto ◽  
Pablo de la Fuente Redondo ◽  
Claudio Gutiérrez

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.


Author(s):  
Xin Wang ◽  
Longxiang Jiang ◽  
Hong Shi ◽  
Zhiyong Feng ◽  
Pufeng Du
Keyword(s):  

Author(s):  
Zongmin Ma ◽  
Li Yan

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


Author(s):  
Zongmin Ma ◽  
Li Yan

The Resource Description Framework (RDF) is a model for representing information resources on the Web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the Web, a huge amount of RDF data is being proliferated and becoming available. So RDF data management is of increasing importance, and has attracted attentions in the database community as well as the Semantic Web community. Currently much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (“not only SQL”) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


2019 ◽  
Vol 15 (2) ◽  
pp. 69-87 ◽  
Author(s):  
Hongyan Yun ◽  
Ying He ◽  
Li Lin ◽  
Xiaohong Wang

The purpose of data integration is that integrates multi-source heterogeneous data. Ontology solves semantic describing of multi-source heterogeneous data. The authors propose a practical approach based on ontology modeling and an information toolkit named Karma modeling for fast data integration, and demonstrate an application example in detail. Armed Conflict Location & Event Data Project (ACLED) is a publicly available conflict event dataset designed for disaggregated conflict analysis and crisis mapping. The authors analyzed the ACLED dataset and domain knowledge to build an Armed Conflict Event ontology, then constructed Karma models to integrate ACLED datasets and publish RDF data. Through SPARQL query to check the correctness of published RDF data. Authors design and developed an ACLED Query System based on Jena API, Canvas JS, and Baidu API, etc. technologies, which provides convenience for governments and researches to analyze regional conflict events and crisis early warning, and it verifies the validity of constructed ontology and the correctness of Karma modeling.


2013 ◽  
Vol 441 ◽  
pp. 970-973
Author(s):  
Yan Qin Zhang ◽  
Jing Bin Wang

As the development of the semantic web, RDF data set has grown rapidly, thus causing the query problem of massive RDF. Using distributed technique to complete the SPARQL (Simple Protocol and RDF Query Language) Query is a new way of solving the large amounts of RDF query problem. At present, most of the RDF query strategies based on Hadoop have to use multiple MapReduce jobs to complete the task, resulting in waste of time. In order to overcome this drawback, MRQJ (using MapReduce to query and join) algorithm is proposed in the paper, which firstly uses a greedy strategy to generate join plan, then only one MapReduce job should be created to get the query results in SPARQL query execution. Finally, a contrast experiment on the LUBM (Lehigh University Benchmark) test data set is conducted, the results of which show that MRQJ method has a great advantage in the case that the query is more complicated.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 122405-122412 ◽  
Author(s):  
Jinhyun Ahn ◽  
Dong-Hyuk Im

2016 ◽  
Vol 43 (6) ◽  
pp. 852-865 ◽  
Author(s):  
Sejin Chun ◽  
Jooik Jung ◽  
Seungmin Seo ◽  
Wonwoo Ro ◽  
Kyong-Ho Lee

To satisfy a user’s complex requirements, Resource Description Framework (RDF) Stream Processing (RSP) systems envision the fusion of remote RDF data with semantic streams, using common data models to query semantic streams continuously. While streaming data are changing at a high rate and are pushed into RSP systems, the remote RDF data are retrieved from different remote sources. With the growth of SPARQL endpoints that provide access to remote RDF data, RSP systems can easily integrate the remote data with streams. Such integration provides new opportunities for mixing static (or quasi-static) data with streams on a large scale. However, the current RSP systems do not offer any optimisation for the integration. In this article, we present an adaptive plan-based approach to efficiently integrate sematic streams with the static data from a remote source. We create a query execution plan based on temporal constraints among constituent services for the timely acquisition of remote data. To predict the change of remote sources in real time, we propose an adaptive process of detecting a source update, forecasting the update in the future, deciding a new plan to obtain remote data and reacting to a new plan. We extend a SPARQL query with operators for describing the multiple strategies of the proposed adaptive process. Experimental results show that our approach is more efficient than the conventional RSP systems in distributed settings.


Sign in / Sign up

Export Citation Format

Share Document