Grace: An Efficient Parallel SPARQL Query System over Large-Scale RDF Data

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Jingwei+: A Distributed Large-Scale RDF Data Server

Web Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-29253-8_77 ◽

2012 ◽

pp. 779-783 ◽

Cited By ~ 4

Author(s):

Xin Wang ◽

Longxiang Jiang ◽

Hong Shi ◽

Zhiyong Feng ◽

Pufeng Du

Keyword(s):

Large Scale ◽

Rdf Data

Download Full-text

Towards Massive RDF Storage in NoSQL Databases

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch013 ◽

2019 ◽

pp. 263-284 ◽

Cited By ~ 2

Author(s):

Zongmin Ma ◽

Li Yan

Keyword(s):

Data Storage ◽

Large Scale ◽

Future Research ◽

Nosql Databases ◽

Current State ◽

Data Store ◽

Rdf Data ◽

Description Framework ◽

Resource Description ◽

The Web

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.

Download Full-text

A Review of RDF Storage in NoSQL Databases

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch009 ◽

2016 ◽

pp. 210-229 ◽

Cited By ~ 2

Author(s):

Zongmin Ma ◽

Li Yan

Keyword(s):

Data Storage ◽

Large Scale ◽

Future Research ◽

Nosql Databases ◽

Current State ◽

Data Store ◽

Rdf Data ◽

Description Framework ◽

Resource Description ◽

The Web

The Resource Description Framework (RDF) is a model for representing information resources on the Web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the Web, a huge amount of RDF data is being proliferated and becoming available. So RDF data management is of increasing importance, and has attracted attentions in the database community as well as the Semantic Web community. Currently much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (“not only SQL”) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.

Download Full-text

Research on Multi-Source Data Integration Based on Ontology and Karma Modeling

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2019040105 ◽

2019 ◽

Vol 15 (2) ◽

pp. 69-87 ◽

Cited By ~ 2

Author(s):

Hongyan Yun ◽

Ying He ◽

Li Lin ◽

Xiaohong Wang

Keyword(s):

Data Integration ◽

Armed Conflict ◽

Domain Knowledge ◽

Heterogeneous Data ◽

Regional Conflict ◽

Ontology Modeling ◽

Source Data ◽

Event Ontology ◽

Query System ◽

Rdf Data

The purpose of data integration is that integrates multi-source heterogeneous data. Ontology solves semantic describing of multi-source heterogeneous data. The authors propose a practical approach based on ontology modeling and an information toolkit named Karma modeling for fast data integration, and demonstrate an application example in detail. Armed Conflict Location & Event Data Project (ACLED) is a publicly available conflict event dataset designed for disaggregated conflict analysis and crisis mapping. The authors analyzed the ACLED dataset and domain knowledge to build an Armed Conflict Event ontology, then constructed Karma models to integrate ACLED datasets and publish RDF data. Through SPARQL query to check the correctness of published RDF data. Authors design and developed an ACLED Query System based on Jena API, Canvas JS, and Baidu API, etc. technologies, which provides convenience for governments and researches to analyze regional conflict events and crisis early warning, and it verifies the validity of constructed ontology and the correctness of Karma modeling.

Download Full-text

Query Optimization of Distributed RDF Data Based on MapReduce

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.441.970 ◽

2013 ◽

Vol 441 ◽

pp. 970-973

Author(s):

Yan Qin Zhang ◽

Jing Bin Wang

Keyword(s):

Query Optimization ◽

Query Language ◽

Sparql Query ◽

Query Execution ◽

Data Set ◽

Greedy Strategy ◽

Lehigh University ◽

Join Algorithm ◽

Simple Protocol ◽

Rdf Data

As the development of the semantic web, RDF data set has grown rapidly, thus causing the query problem of massive RDF. Using distributed technique to complete the SPARQL (Simple Protocol and RDF Query Language) Query is a new way of solving the large amounts of RDF query problem. At present, most of the RDF query strategies based on Hadoop have to use multiple MapReduce jobs to complete the task, resulting in waste of time. In order to overcome this drawback, MRQJ (using MapReduce to query and join) algorithm is proposed in the paper, which firstly uses a greedy strategy to generate join plan, then only one MapReduce job should be created to get the query results in SPARQL query execution. Finally, a contrast experiment on the LUBM (Lehigh University Benchmark) test data set is conducted, the results of which show that MRQJ method has a great advantage in the case that the query is more complicated.

Download Full-text

Efficient Access Control of Large Scale RDF Data Using Prefix-Based Labeling

IEEE Access ◽

10.1109/access.2020.3007592 ◽

2020 ◽

Vol 8 ◽

pp. 122405-122412 ◽

Cited By ~ 1

Author(s):

Jinhyun Ahn ◽

Dong-Hyuk Im

Keyword(s):

Access Control ◽

Large Scale ◽

Efficient Access ◽

Rdf Data

Download Full-text

MITIGATION OF LARGE-SCALE RDF DATA LOADING WITH THE EMPLOYMENT OF A CLOUD COMPUTING SERVICE

Proceedings of the International Conference on Knowledge Engineering and Ontology Development ◽

10.5220/0003142204890492 ◽

2010 ◽

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Cloud Computing Service ◽

Rdf Data ◽

Data Loading

Download Full-text

An adaptive plan-based approach to integrating semantic streams with remote RDF data

Journal of Information Science ◽

10.1177/0165551516670278 ◽

2016 ◽

Vol 43 (6) ◽

pp. 852-865 ◽

Cited By ~ 2

Author(s):

Sejin Chun ◽

Jooik Jung ◽

Seungmin Seo ◽

Wonwoo Ro ◽

Kyong-Ho Lee

Keyword(s):

Large Scale ◽

Streaming Data ◽

High Rate ◽

Execution Plan ◽

Adaptive Process ◽

Multiple Strategies ◽

Static Data ◽

Rdf Data ◽

Description Framework ◽

Remote Data

To satisfy a user’s complex requirements, Resource Description Framework (RDF) Stream Processing (RSP) systems envision the fusion of remote RDF data with semantic streams, using common data models to query semantic streams continuously. While streaming data are changing at a high rate and are pushed into RSP systems, the remote RDF data are retrieved from different remote sources. With the growth of SPARQL endpoints that provide access to remote RDF data, RSP systems can easily integrate the remote data with streams. Such integration provides new opportunities for mixing static (or quasi-static) data with streams on a large scale. However, the current RSP systems do not offer any optimisation for the integration. In this article, we present an adaptive plan-based approach to efficiently integrate sematic streams with the static data from a remote source. We create a query execution plan based on temporal constraints among constituent services for the timely acquisition of remote data. To predict the change of remote sources in real time, we propose an adaptive process of detecting a source update, forecasting the update in the future, deciding a new plan to obtain remote data and reacting to a new plan. We extend a SPARQL query with operators for describing the multiple strategies of the proposed adaptive process. Experimental results show that our approach is more efficient than the conventional RSP systems in distributed settings.

Download Full-text