scholarly journals Dynamic Partitioning Supporting Load Balancing for Distributed RDF Graph Stores

Symmetry ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 926
Author(s):  
Kyoungsoo Bok ◽  
Junwon Kim ◽  
Jaesoo Yoo

Various resource description framework (RDF) partitioning methods have been studied for the efficient distributed processing of a large RDF graph. The RDF graph has symmetrical characteristics because subject and object can be used interchangeably if predicate is changed. This paper proposes a dynamic partitioning method of RDF graphs to support load balancing in distributed environments where data insertion and change continue to occur. The proposed method generates clusters and subclusters by considering the usage frequency of the RDF graph that are used by queries as the criteria to perform graph partitioning. It creates a cluster by grouping RDF subgraphs with higher usage frequency while creating a subcluster with lower usage frequency. These clusters and subclusters conduct load balancing by using the mean frequency of queries for the distributed server and conduct graph data partitioning by considering the size of the data stored in each distributed server. It also minimizes the number of edge-cuts connected to clusters and subclusters to minimize communication costs between servers. This solves the problem of data concentration to specific servers due to ongoing data changes and additions and allows efficient load balancing among servers. The performance results show that the proposed method significantly outperforms the existing partitioning methods in terms of query performance time in a distributed server.

2020 ◽  
Vol 16 (2) ◽  
pp. 223-247
Author(s):  
Siham Eddamiri ◽  
Asmaa Benghabrit ◽  
Elmoukhtar Zemmouri

Purpose The purpose of this paper is to present a generic pipeline for Resource Description Framework (RDF) graph mining to provide a comprehensive review of each step in the knowledge discovery from data process. The authors also investigate different approaches and combinations to extract feature vectors from RDF graphs to apply the clustering and theme identification tasks. Design/methodology/approach The proposed methodology comprises four steps. First, the authors generate several graph substructures (Walks, Set of Walks, Walks with backward and Set of Walks with backward). Second, the authors build neural language models to extract numerical vectors of the generated sequences by using word embedding techniques (Word2Vec and Doc2Vec) combined with term frequency-inverse document frequency (TF-IDF). Third, the authors use the well-known K-means algorithm to cluster the RDF graph. Finally, the authors extract the most relevant rdf:type from the grouped vertices to describe the semantics of each theme by generating the labels. Findings The experimental evaluation on the state of the art data sets (AIFB, BGS and Conference) shows that the combination of Set of Walks-with-backward with TF-IDF and Doc2vec techniques give excellent results. In fact, the clustering results reach more than 97% and 90% in terms of purity and F-measure, respectively. Concerning the theme identification, the results show that by using the same combination, the purity and F-measure criteria reach more than 90% for all the considered data sets. Originality/value The originality of this paper lies in two aspects: first, a new machine learning pipeline for RDF data is presented; second, an efficient process to identify and extract relevant graph substructures from an RDF graph is proposed. The proposed techniques were combined with different neural language models to improve the accuracy and relevance of the obtained feature vectors that will be fed to the clustering mechanism.


2017 ◽  
Vol 1 (2) ◽  
pp. 84-103 ◽  
Author(s):  
Dong Wang ◽  
Lei Zou ◽  
Dongyan Zhao

Abstract The Simple Protocol and RDF Query Language (SPARQL) query language allows users to issue a structural query over a resource description framework (RDF) graph. However, the lack of a spatiotemporal query language limits the usage of RDF data in spatiotemporal-oriented applications. As the spatiotemporal information continuously increases in RDF data, it is necessary to design an effective and efficient spatiotemporal RDF data management system. In this paper, we formally define the spatiotemporal information-integrated RDF data, introduce a spatiotemporal query language that extends the SPARQL language with spatiotemporal assertions to query spatiotemporal information-integrated RDF data, and design a novel index and the corresponding query algorithm. The experimental results on a large, real RDF graph integrating spatial and temporal information (> 180 million triples) confirm the superiority of our approach. In contrast to its competitors, gst-store outperforms by more than 20%-30% in most cases.


2017 ◽  
Vol 2 (2) ◽  
pp. 41-55 ◽  
Author(s):  
Chunqiu Li ◽  
Shigeo Sugimoto

Abstract Purpose The purpose of this paper is to discuss provenance description of metadata terms and metadata vocabularies as a set of metadata terms. Provenance is crucial information to keep track of changes of metadata terms and metadata vocabularies for their consistent maintenance. Design/methodology/approach The W3C PROV standard for general provenance description and Resource Description Framework (RDF) are adopted as the base models to formally define provenance description for metadata vocabularies. Findings This paper defines a few primitive change types of metadata terms, and a provenance description model of the metadata terms based on the primitive change types. We also provide examples of provenance description in RDF graphs to show the proposed model. Research limitations The model proposed in this paper is defined based on a few primitive relationships (e.g. addition, deletion, and replacement) between pre-version and post-version of a metadata term. The model is simplified and the practical changes of metadata terms can be more complicated than the primitive relationships discussed in the model. Practical implications Formal provenance description of metadata vocabularies can improve maintainability of metadata vocabularies over time. Conventional maintenance of metadata terms is the maintenance of documents of terms. The proposed model enables effective and automated tracking of change history of metadata vocabularies using simple formal description scheme defined based on widely-used standards. Originality/value Changes in metadata vocabularies may cause inconsistencies in the longterm use of metadata. This paper proposes a simple and formal scheme of provenance description of metadata vocabularies. The proposed model works as the basis of automated maintenance of metadata terms and their vocabularies and is applicable to various types of changes.


Author(s):  
Kamalendu Pal

Many industries prefer worldwide business operations due to the economic advantage of globalization on product design and development. These industries increasingly operate globalized multi-tier supply chains and deliver products and services all over the world. This global approach produces huge amounts of heterogeneous data residing at various business operations, and the integration of these data plays an important role. Integrating data from multiple heterogeneous sources need to deal with different data models, database schema, and query languages. This chapter presents a semantic web technology-based data integration framework that uses relational databases and XML data with the help of ontology. To model different source schemas, this chapter proposes a method based on the resource description framework (RDF) graph patterns and query rewriting techniques. The semantic translation between the source schema and RDF ontology is described using query and transformational language SPARQL.


Author(s):  
Alejandro Llaves ◽  
Oscar Corcho ◽  
Peter Taylor ◽  
Kerry Taylor

This paper presents a generic approach to integrate environmental sensor data efficiently, allowing the detection of relevant situations and events in near real-time through continuous querying. Data variety is addressed with the use of the Semantic Sensor Network ontology for observation data modelling, and semantic annotations for environmental phenomena. Data velocity is handled by distributing sensor data messaging and serving observations as RDF graphs on query demand. The stream processing engine presented in the paper, morph-streams++, provides adapters for different data formats and distributed processing of streams in a cluster. An evaluation of different configurations for parallelization and semantic annotation parameters proves that the described approach reduces the average latency of message processing in some cases.


2014 ◽  
Vol 08 (03) ◽  
pp. 335-384 ◽  
Author(s):  
Ngan T. Dong ◽  
Lawrence B. Holder

The Resource Description Framework (RDF) is the primary language to describe information on the Semantic Web. The deployment of semantic web search from Google and Microsoft, the Linked Open Data Community project along with the announcement of schema.org by Yahoo, Bing and Google have significantly fostered the generation of data available in RDF format. Yet the RDF is a computer representation of data and thus is hard for the non-expert user to understand. We propose a Natural Language Generation (NLG) engine to generate English text from a small RDF graph. The Natural Language Generation from Graphs (NLGG) system uses an ontology skeleton, which contains hierarchies of concepts, relationships and attributes, along with handcrafted template information as the knowledge base. We performed two experiments to evaluate NLGG. First, NLGG is tested with RDF graphs extracted from four ontologies in different domains. A Simple Verbalizer is used to compare the results. NLGG consistently outperforms the Simple Verbalizer in all the test cases. In the second experiment, we compare the effort spent to make NLGG and NaturalOWL work with the M-PIRO ontology. Results show that NLGG generates acceptable text with much smaller effort.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Ramalingam Gomathi ◽  
Dhandapani Sharmila

The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.


Sign in / Sign up

Export Citation Format

Share Document