scholarly journals MuSe: a multi-level storage scheme for big RDF data using MapReduce

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Tanvi Chawla ◽  
Girdhari Singh ◽  
Emmanuel S. Pilli

AbstractResource Description Framework (RDF) model owing to its flexible structure is increasingly being used to represent Linked data. The rise in amount of Linked data and Knowledge graphs has resulted in an increase in the volume of RDF data. RDF is used to model metadata especially for social media domains where the data is linked. With the plethora of RDF data sources available on the Web, scalable RDF data management becomes a tedious task. In this paper, we present MuSe—an efficient distributed RDF storage scheme for storing and querying RDF data with Hadoop MapReduce. In MuSe, the Big RDF data is stored at two levels for answering the common triple patterns in SPARQL queries. MuSe considers the type of frequently occuring triple patterns and optimizes RDF storage to answer such triple patterns in minimum time. It accesses only the tables that are sufficient for answering a triple pattern instead of scanning the whole RDF dataset. The extensive experiments on two synthetic RDF datasets i.e. LUBM and WatDiv, show that MuSe outperforms the compared state-of-the art frameworks in terms of query execution time and scalability.

Author(s):  
Ala Djeddai ◽  
Hassina Seridi-Bouchelaghem ◽  
Med Tarek Khadir

Regardless of the knowledge structure lack about Resource Description Framework (RDF) data, difficulties, principally, occur in specifying and answering queries. Approximate querying is the solution to find relevant information by getting a set of sub structures (e.g. sub graphs) matching the query. Approaches based on the structure and others based on semantic, marginalized the common meaning between concepts in its computing. In this paper in order to improve the approximation by introducing the meaning similarity between components in the query and RDF components is proposed, getting better need satisfaction. The meaning similarity measure can be calculated using WordNet and used in all steps of the query answering process. In addition, other important properties in the approximation level calculation between query paths and RDF paths are considered; besides indexing and optimizations strategies are performed. Answers are a set of sub graphs ranked in decreasing order on its matching degree. Experiments are conducted within real RDF dataset.


2017 ◽  
Vol 44 (2) ◽  
pp. 203-229 ◽  
Author(s):  
Javier D Fernández ◽  
Miguel A Martínez-Prieto ◽  
Pablo de la Fuente Redondo ◽  
Claudio Gutiérrez

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.


Author(s):  
Zongmin Ma ◽  
Li Yan

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


Author(s):  
Zongmin Ma ◽  
Li Yan

The Resource Description Framework (RDF) is a model for representing information resources on the Web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the Web, a huge amount of RDF data is being proliferated and becoming available. So RDF data management is of increasing importance, and has attracted attentions in the database community as well as the Semantic Web community. Currently much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (“not only SQL”) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


2018 ◽  
Vol 10 (8) ◽  
pp. 2613
Author(s):  
Dandan He ◽  
Zhongfu Li ◽  
Chunlin Wu ◽  
Xin Ning

Industrialized construction has raised the requirements of procurement methods used in the construction industry. The rapid development of e-commerce offers efficient and effective solutions, however the large number of participants in the construction industry means that the data involved are complex, and problems arise related to volume, heterogeneity, and fragmentation. Thus, the sector lags behind others in the adoption of e-commerce. In particular, data integration has become a barrier preventing further development. Traditional e-commerce platform, which considered data integration for common product data, cannot meet the requirements of construction product data integration. This study aimed to build an information-integrated e-commerce platform for industrialized construction procurement (ICP) to overcome some of the shortcomings existing platforms. We proposed a platform based on Building Information Modelling (BIM) and linked data, taking an innovative approach to data integration. It uses industrialized construction technology to support product standardization, BIM to support procurement process, and linked data to connect different data sources. The platform was validated using a case study. With the development of an e-commerce ontology, industrialized construction component information was extracted from BIM models and converted to Resource Description Framework (RDF) format. Related information from different data sources was also converted to RDF format, and Simple Protocol and Resource Description Framework Query Language (SPARQL) queries were implemented. The platform provides a solution for the development of e-commerce platform in the construction industry.


Author(s):  
E. Hietanen ◽  
L. Lehto ◽  
P. Latvala

In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. <br><br> A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.


2020 ◽  
Vol 25 (6) ◽  
pp. 793-801
Author(s):  
Maturi Sreerama Murty ◽  
Nallamothu Nagamalleswara Rao

Following the accessibility of Resource Description Framework (RDF) resources is a key capacity in the establishment of Linked Data frameworks. It replaces center around information reconciliation contrasted with work rate. Exceptional Connected Data that empowers applications to improve by changing over legacy information into RDF resources. This data contains bibliographic, geographic, government, arrangement, and alternate routes. Regardless, a large portion of them don't monitor the subtleties and execution of each sponsored resource. In such cases, it is vital for those applications to track, store and scatter provenance information that mirrors their source data and introduced tasks. We present the RDF information global positioning framework. Provenance information is followed during the progress cycle and oversaw multiple times. From that point, this data is appropriated utilizing of this concept URIs. The proposed design depends on the Harvard Library Database. The tests were performed on informational indexes with changes made to the qualities??In the RDF and the subtleties related with the provenance. The outcome has quieted the guarantee as in it pulls in record wholesalers to make significant realities that develop while taking almost no time and exertion.


Author(s):  
Hatem Soliman ◽  
Izhar Ahmed Khan ◽  
Yasir Hussain

The resource description framework (RDF) was adopted by the World Wide Web (W3C) as an essential semantic web standard and the RDF scheme. It accords the hard semantics in the description and wields the crisp metadata. However, it usually produces vague or ambiguous information. Consequently, fuzzy RDF helps deal with such special data by transforming the crisp values into a fuzzy set. A method for analyzing fuzzy RDF data is proposed in this paper. To this end, first, we decompose the RDF into fuzzy RDF variables. Second, we are designing a model for global sensitivity analysis based on the decomposition of fuzzy RDF. It figures out the ambiguities of fuzzy RDF data. The proposed global sensitivity analysis model provides the importance of fuzzy RDF data by considering the response function’s structure and reselects it to a certain degree. A practical tool for sensitivity analysis of fuzzy RDF data has also been implemented based on the proposed model.


Author(s):  
Waqas Ali ◽  
Muhammad Saleem ◽  
Bin Yao ◽  
Axel-Cyrille Ngonga Ngomo

The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is a huge adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ different mechanisms to implement key components of the query processing engines such as data storage, indexing, language support, and query execution. All these components govern how queries are executed and can have a substantial effect on the query runtime. For example, the storage of RDF data in various ways significantly affects the data storage space required and the query runtime performance. The type of indexing approach used in RDF engines is key for fast data lookup. The type of the underlying querying language (e.g., SPARQL or SQL) used for query execution is a key optimization component of the RDF storage solutions. Finally, query execution involving different join orders significantly affects the query response time. This paper provides a comprehensive review of centralized and distributed RDF engines in terms of storage, indexing, language support, and query execution.


2016 ◽  
Vol 35 (1) ◽  
pp. 51 ◽  
Author(s):  
Juliet L. Hardesty

Metadata, particularly within the academic library setting, is often expressed in eXtensible Markup Language (XML) and managed with XML tools, technologies, and workflows. Managing a library’s metadata currently takes on a greater level of complexity as libraries are increasingly adopting the Resource Description Framework (RDF). Semantic Web initiatives are surfacing in the library context with experiments in publishing metadata as Linked Data sets and also with development efforts such as BIBFRAME and the Fedora 4 Digital Repository incorporating RDF. Use cases show that transitions into RDF are occurring in both XML standards and in libraries with metadata encoded in XML. It is vital to understand that transitioning from XML to RDF requires a shift in perspective from replicating structures in XML to defining meaningful relationships in RDF. Establishing coordination and communication among these efforts will help as more libraries move to use RDF, produce Linked Data, and approach the Semantic Web.


Sign in / Sign up

Export Citation Format

Share Document