scholarly journals When is the Peak Performance Reached? An Analysis of RDF Triple Stores

2021 ◽  
Author(s):  
Hashim Khan ◽  
Manzoor Ali ◽  
Axel-Cyrille Ngonga Ngomo ◽  
Muhammad Saleem

With significant growth in RDF datasets, application developers demand online availability of these datasets to meet the end users’ expectations. Various interfaces are available for querying RDF data using SPARQL query language. Studies show that SPARQL end-points may provide high query runtime performance at the cost of low availability. For example, it has been observed that only 32.2% of public endpoints have a monthly uptime of 99–100%. One possible reason for this low availability is the high workload experienced by these SPARQL endpoints. As complete query execution is performed at server side (i.e., SPARQL endpoint), this high query processing workload may result in performance degradation or even a service shutdown. We performed extensive experiments to show the query processing capabilities of well-known triple stores by using their SPARQL endpoints. In particular, we stressed these triple stores with multiple parallel requests from different querying agents. Our experiments revealed the maximum query processing capabilities of these triple stores after which point they lead to service shutdowns. We hope this analysis will help triple store developers to design workload-aware RDF engines to improve the availability of their public endpoints with high throughput.

Author(s):  
Kjetil Nørvåg

The amount of data available in XML is rapidly increasing and at the same time the price of mass storage is rapidly decreasing, and this makes it possible to store larger amounts of data. The contents of a database or data warehouse are seldom static. New documents are created, documents are deleted and, more important, documents are updated. In many cases, one wants to be able to search in historical (old) versions, retrieve documents that were valid at a certain time, query changes to documents, and so forth. (Note that although this process is somewhat similar to general document versioning maintenance, the aspect of time makes possibilities and appropriate solutions different.) The “easiest” way to do this is to store all versions of all documents in the database and use a middleware layer to convert temporal query language statements into conventional statements, executed by an underlying database system (an example of such a system is TeXOR; Nørvåg, Limstrand, & Myklebust, 2003). Although this approach makes the introduction of temporal support easier, it can be difficult to achieve good performance: temporal query processing is in general costly, and the cost of storing the complete document versions can be high. Thus, a temporal XML database system is necessary.


2021 ◽  
Vol 48 (4) ◽  
pp. 3-3
Author(s):  
Ingo Weber

Blockchain is a novel distributed ledger technology. Through its features and smart contract capabilities, a wide range of application areas opened up for blockchain-based innovation [5]. In order to analyse how concrete blockchain systems as well as blockchain applications are used, data must be extracted from these systems. Due to various complexities inherent in blockchain, the question how to interpret such data is non-trivial. Such interpretation should often be shared among parties, e.g., if they collaborate via a blockchain. To this end, we devised an approach codify the interpretation of blockchain data, to extract data from blockchains accordingly, and to output it in suitable formats [1, 2]. This work will be the main topic of the keynote. In addition, application developers and users of blockchain applications may want to estimate the cost of using or operating a blockchain application. In the keynote, I will also discuss our cost estimation method [3, 4]. This method was designed for the Ethereum blockchain platform, where cost also relates to transaction complexity, and therefore also to system throughput.


Author(s):  
Omar Shehab ◽  
Ali Hussein Saleh Zolait

In this paper, the authors propose a Semantic Search Engine, which retrieves software components precisely and uses techniques to store these components in a database, such as ontology technology. The engine uses semantic query language to retrieve these components semantically. The authors use an exploratory study where the proposed method is mapped between object-oriented concepts and web ontology language. A qualitative survey and interview techniques were used to collect data. The findings after implementing this research are a set of guidelines, a model, and a prototype to describe the semantic search engine system. The guidelines provided help software developers and companies reduce the cost, time, and risks of software development.


Author(s):  
Arijit Sengupta ◽  
Ramesh Venkataraman

This chapter introduces a complete storage and retrieval architecture for a database environment for XML documents. DocBase, a prototype system based on this architecture, uses a flexible storage and indexing technique to allow highly expressive queries without the necessity of mapping documents to other database formats. DocBase is an integration of several techniques that include (i) a formal model called Heterogeneous Nested Relations (HNR), (ii) a conceptual model XER (Extensible Entity Relationship), (ii) formal query languages (Document Algebra and Calculus), (iii) a practical query language (Document SQL or DSQL), (iv) a visual query formulation method with QBT (Query By Templates), and (v) the DocBase query processing architecture. This paper focuses on the overall architecture of DocBase including implementation details, describes the details of the query-processing framework, and presents results from various performance tests. The paper summarizes experimental and usability analyses to demonstrate its feasibility as a general architecture for native as well as embedded document manipulation methods.


2016 ◽  
Vol 2016 (4) ◽  
pp. 202-218 ◽  
Author(s):  
Ryan Henry

Abstract Private information retrieval (PIR) is a way for clients to query a remote database without the database holder learning the clients’ query terms or the responses they generate. Compelling applications for PIR are abound in the cryptographic and privacy research literature, yet existing PIR techniques are notoriously inefficient. Consequently, no such PIRbased application to date has seen real-world at-scale deployment. This paper proposes new “batch coding” techniques to help address PIR’s efficiency problem. The new techniques exploit the connection between ramp secret sharing schemes and efficient information-theoretically secure PIR (IT-PIR) protocols. This connection was previously observed by Henry, Huang, and Goldberg (NDSS 2013), who used ramp schemes to construct efficient “batch queries” with which clients can fetch several database records for the same cost as fetching a single record using a standard, non-batch query. The new techniques in this paper generalize and extend those of Henry et al. to construct “batch codes” with which clients can fetch several records for only a fraction the cost of fetching a single record using a standard non-batch query over an unencoded database. The batch codes are highly tuneable, providing a means to trade off (i) lower server-side computation cost, (ii) lower server-side storage cost, and/or (iii) lower uni- or bi-directional communication cost, in exchange for a comparatively modest decrease in resilience to Byzantine database servers.


2010 ◽  
Vol 08 (02) ◽  
pp. 247-293 ◽  
Author(s):  
ALI CAKMAK ◽  
GULTEKIN OZSOYOGLU ◽  
RICHARD W. HANSON

Metabolism is a representation of the biochemical principles that govern the production, consumption, degradation, and biosynthesis of metabolites in living cells. Organisms respond to changes in their physiological conditions or environmental perturbations (i.e. constraints) via cooperative implementation of such principles. Querying inner working principles of metabolism under different constraints provides invaluable insights for both researchers and educators. In this paper, we propose a metabolism query language (MQL) and discuss its query processing. MQL enables researchers to explore the behavior of the metabolism with a wide-range of predicates including dietary and physiological condition specifications. The query results of MQL are enriched with both textual and visual representations, and its query processing is completely tailored based on the underlying metabolic principles.


Author(s):  
Shi-Kuo Chang ◽  
Gennaro Costagliola ◽  
Erland Jungert ◽  
Karin Camara

Sensor data fusion imposes a number of novel requirements on query languages and query processing techniques. A spatial/temporal query language called SQL has been proposed to support the retrieval of multimedia information from multiple sources and databases. This chapter investigates intelligent querying techniques including fusion techniques, multimedia data transformations, interactive progressive query building and SQL query processing techniques using sensor data fusion. The authors illustrate and discuss tasks and query patterns for information fusion, provide a number of examples of iterative queries and show the effectiveness of SQL in a command-action scenario.


Author(s):  
Rui Peng ◽  
Alex J. Aved ◽  
Kien A. Hua

With the proliferation of inexpensive cameras and the availability of high-speed wired and wireless networks, systems of distributed cameras are becoming an enabling technology for a broad range of interdisciplinary applications in domains such as public safety and security, manufacturing, transportation, and healthcare. Today’s live video processing systems on networks of distributed cameras, however, are designed for specific classes of applications. To provide a generic query processing platform for applications of distributed camera networks, the authors designed and implemented a new class of general purpose database management systems, the live video database management system (LVDBMS). The authors view networked video cameras as a special class of interconnected storage devices, and allow the user to formulate ad hoc queries expressed over real-time live video feeds. This paper introduces their system and presents the live video data model, the query language, and the query processing and optimization technique.


Author(s):  
Jingwei Cheng ◽  
Z. M. Ma ◽  
Qiang Tong

RDF plays an important role in representing Web resources in a natural and flexible way. As the amount of RDF datasets increasingly growing, storing and querying theses data have attracted the attention of more and more researchers. In this chapter, we first make a review of approaches for query processing of RDF datasets. We categorize existing methods as two classes, those making use of RDBMS to implement the storage and retrieval, and those devising their own native storage schemas. They are called Relational RDF Stores and Native Stores respectively. Secondly, we survey some important extensions of SPARQL, standard query language for RDF, which extend the expressing power of SPARQL to allow more sophisticated language constructs that meet the needs from various application scenarios.


Sign in / Sign up

Export Citation Format

Share Document