Hybridation of Labeling Schemes for Efficient Dynamic Updates

Author(s):  
Su Cheng Haw ◽  
Samini Subramaniam ◽  
Wei Siang Lim ◽  
Fang Fang Chua

<p>With XML as the leading standard for data representation over the Web, it is crucial to store and query XML data. However, relational databases are the dominant database technology in most organizations. Thus, replacing relational database with a pure XML database is not a wise choice. One most prominent solution is to map XML into relational database. This paper introduces a robust labeling scheme which is a hybrid labeling scheme combining the beauty features of extended range and ORDPATH schemes to supports dynamic updates. In addition, we also proposed a mapping scheme based on the hybrid labeling scheme. Our proposed approach is evaluated in terms of (i) loading time, (ii) storage size, (iii) query retrieval time, and (iv) dynamic updates time, as compared to ORDPATH and ME schemes. The experimental evaluation results shows that our proposed approach is scalable to support huge datasets and dynamic updates.</p>

2014 ◽  
Vol 25 (4) ◽  
pp. 38-65
Author(s):  
Yongkwon Kim ◽  
Heejung Yang ◽  
Chin-Wan Chung

Modeling and simulation (M&S) are widely used for design, analysis, and optimization of complex systems and natural phenomena in various areas such as the defense industry and the weather system. In many cases, the environment is a key part of complex systems and natural phenomena. It includes physical aspects of the real world which provide the context for a specific simulation. Recently, several simulation systems are integrated to work together when they have needs for exchanging information. Interoperability of heterogeneous simulations depends heavily on sharing complex environmental data in a consistent and complete manner. SEDRIS (Synthetic Environmental Data Representation and Interchange Specification) is an ISO standard for representation and interchange of environmental data and widely adopted in M&S area. As the size of the simulation increases, the size of the environmental data which should be exchanged between simulations increases. Therefore, an efficient management of the environmental data is very important. In this paper, the authors propose storing and retrieval methods of SEDRIS transmittals using a relational database system in order to be able to retrieve data efficiently in the environmental data server cooperating with many heterogeneous distributed simulations. By analyzing the structure and the content of SEDRIS transmittals, relational database schemas are designed. To reduce query processing time of SEDRIS transmittals, direct storing and retrieval methods which do not require the type conversion of SEDRIS transmittals are proposed. Experimental analyses are conducted to show the efficiency of the proposed approach. The results confirm that the proposed approach greatly reduces the storing time and retrieval time compared to comparison approaches.


Extensible Markup Language (XML) technology is widely used for data exchange and data representation in both online and offline mode. This structured format language able to be transformed into other formats and share information across platforms. XML is simple; however, it is designed to accommodate changes. For this paper, a study on transformation of XML document into relational database is conducted. Crucial part of this process is how to maintain the hierarchy and relationships between data in the document into database. Approaches that are discussed in this paper each uses own unique way of data storing technique and database design. Therefore, each algorithm is assessed with three datasets constitute of small, medium and large size XML file. The efficiency of the algorithms is being tested on time taken for data storing and query execution process. At the end of the evaluation, we discuss factors that affect algorithm performance and present suggestions to improve mapping scheme for future works


2015 ◽  
Vol 6 (4) ◽  
pp. 1-19 ◽  
Author(s):  
Negin Keivani ◽  
Abdelsalam M. Maatuk ◽  
Shadi Aljawarneh ◽  
Muhammad Akhtar Ali

Object-relational technology provides a significant increase in scalability and flexibility over the traditional relational databases. The additional object-relational features are particularly satisfying for advanced database applications that relational database systems have experienced difficulties. The key factor to the success of object-relational database systems is their performance. This paper aims to review the promises of Object-Relational database systems, examine the reality, and how their promises may be fulfilled through unification with the relational technology. To investigate the performance implications of using object-relational relative to relational technology, the query-oriented BUCKY benchmark has been previously applied to an early object-relational database system, i.e., Illustra 97. This paper presents the results obtained from implementing and running the BUCKY benchmark on Oracle 10g. The results acquired from the work described in this paper are compared with the results obtained in BUCKY benchmark. This study throws light on the functionality of object-relational databases, where object-relational technology has made improvements but some limitations are identified as well. In general, the performance of relational supersedes that of object-relational database system.


2016 ◽  
pp. 855-877
Author(s):  
Yongkwon Kim ◽  
Heejung Yang ◽  
Chin-Wan Chung

Modeling and simulation (M&S) are widely used for design, analysis, and optimization of complex systems and natural phenomena in various areas such as the defense industry and the weather system. In many cases, the environment is a key part of complex systems and natural phenomena. It includes physical aspects of the real world which provide the context for a specific simulation. Recently, several simulation systems are integrated to work together when they have needs for exchanging information. Interoperability of heterogeneous simulations depends heavily on sharing complex environmental data in a consistent and complete manner. SEDRIS (Synthetic Environmental Data Representation and Interchange Specification) is an ISO standard for representation and interchange of environmental data and widely adopted in M&S area. As the size of the simulation increases, the size of the environmental data which should be exchanged between simulations increases. Therefore, an efficient management of the environmental data is very important. In this paper, the authors propose storing and retrieval methods of SEDRIS transmittals using a relational database system in order to be able to retrieve data efficiently in the environmental data server cooperating with many heterogeneous distributed simulations. By analyzing the structure and the content of SEDRIS transmittals, relational database schemas are designed. To reduce query processing time of SEDRIS transmittals, direct storing and retrieval methods which do not require the type conversion of SEDRIS transmittals are proposed. Experimental analyses are conducted to show the efficiency of the proposed approach. The results confirm that the proposed approach greatly reduces the storing time and retrieval time compared to comparison approaches.


Author(s):  
Mary Ann Malloy ◽  
Irena Mlynkova

As XML technologies have become a standard for data representation, it is inevitable to propose and implement efficient techniques for managing XML data. A natural alternative is to exploit tools and functions offered by relational database systems. Unfortunately, this approach has many detractors, especially due to inefficiency caused by structural differences between XML data and relations. But, on the other hand, relational databases represent a mature, verified and reliable technology for managing any kind of data including XML documents. In this chapter, the authors provide an overview and classification of existing approaches to XML data management in relational databases. They view the problem from both state-of-the-practice and state-of-the-art perspectives. The authors describe the current best known solutions, their advantages and disadvantages. Finally, they discuss some open issues and their possible solutions.


1988 ◽  
Vol 27 (04) ◽  
pp. 177-183 ◽  
Author(s):  
P. J. Jasinski ◽  
H.-P. Meinzer ◽  
C. O. Köhler ◽  
B. Sandblad

SummaryThe structure of computer-processed images is described. This is the basis for presenting a method to integrate traditional database concepts and images which represent a certain class of nonformatted, heterogeneous information. The method presented consists of a special, surrogate Image data type and a design of Image Directory, which associates formatted data, raw digital images stored conceptually in the form of a file of variable length, bit string records, and look-up tables. Beside individual and aggregate-oriented retrieval, some tools supporting more sophisticated analyses of images are also suggested. In order to make the description of solutions clear and truly methodological, a basic notion of the relational database technology and a de facto standard query language (SQL) have been applied. The method presented can be used to build various medical applications where images and/or graphics constitute an important fraction of information.


2012 ◽  
Vol 241-244 ◽  
pp. 2561-2564
Author(s):  
Lian Sheng Wang ◽  
Ting Jun Li ◽  
Tao Sun ◽  
Zhi Yong Liu ◽  
Xiao Dong Sun

IETM database standardized on S1000D consists of XML documents. In view of former relational database treat XML documents defectively, design storage and index pattern of IETM database which is based on features of module and information object standardized on S1000D with relational XML database technology. There is a certain significance to broaden.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 907
Author(s):  
Su-Cheng Haw ◽  
Aisyah Amin ◽  
Chee-Onn Wong ◽  
Samini Subramaniam

Background: As the standard for the exchange of data over the World Wide Web, it is important to ensure that the eXtensible Markup Language (XML) database is capable of supporting not only efficient query processing but also capable of enduring frequent data update operations over the dynamic changes of Web content. Most of the existing XML annotation is based on a labeling scheme to identify each hierarchical position of the XML nodes. This computation is costly as any updates will cause the whole XML tree to be re-labelled. This impact can be observed on large datasets. Therefore, a robust labeling scheme that avoids re-labeling is crucial. Method: Here, we present ORD-GAP (named after Order Gap), a robust and persistent XML labeling scheme that supports dynamic updates. ORD-GAP assigns unique identifiers with gaps in-between XML nodes, which could easily identify the level, Parent-Child (P-C), Ancestor-Descendant (A-D) and sibling relationship. ORD-GAP adopts the OrdPath labeling scheme for any future insertion. Results: We demonstrate that ORD-GAP is robust enough for dynamic updates, and have implemented it in three use cases: (i) left-most, (ii) in-between and (iii) right-most insertion. Experimental evaluations on DBLP dataset demonstrated that ORD-GAP outperformed existing approaches such as ORDPath and ME Labeling concerning database storage size, data loading time and query retrieval. On average, ORD-GAP has the best storing and query retrieval time. Conclusion: The main contributions of this paper are: (i) A robust labeling scheme named ORD-GAP that assigns certain gap between each node to support future insertion, and (ii) An efficient mapping scheme, which built upon ORD-GAP labeling scheme to transform XML into RDB effectively.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Claire M Simpson ◽  
Florian Gnad

Abstract Graph representations provide an elegant solution to capture and analyze complex molecular mechanisms in the cell. Co-expression networks are undirected graph representations of transcriptional co-behavior indicating (co-)regulations, functional modules or even physical interactions between the corresponding gene products. The growing avalanche of available RNA sequencing (RNAseq) data fuels the construction of such networks, which are usually stored in relational databases like most other biological data. Inferring linkage by recursive multiple-join statements, however, is computationally expensive and complex to design in relational databases. In contrast, graph databases store and represent complex interconnected data as nodes, edges and properties, making it fast and intuitive to query and analyze relationships. While graph-based database technologies are on their way from a fringe domain to going mainstream, there are only a few studies reporting their application to biological data. We used the graph database management system Neo4j to store and analyze co-expression networks derived from RNAseq data from The Cancer Genome Atlas. Comparing co-expression in tumors versus healthy tissues in six cancer types revealed significant perturbation tracing back to erroneous or rewired gene regulation. Applying centrality, community detection and pathfinding graph algorithms uncovered the destruction or creation of central nodes, modules and relationships in co-expression networks of tumors. Given the speed, accuracy and straightforwardness of managing these densely connected networks, we conclude that graph databases are ready for entering the arena of biological data.


Sign in / Sign up

Export Citation Format

Share Document