scholarly journals ORD-GAP: a hybrid-based labeling schemes to support XML dynamic updates

Author(s):  
Aisyah Amin ◽  
Su-Cheng Haw ◽  
Samini Subramaniam

<span>eXtensible Markup Language (XML) has been widely used as the standard for data exchange standard over the Internet. With the fast growing rate of data, especially with high updates, it is crucial to ensure that the XML is able to cope with frequent changes with very least effect on the existing structure. Therefore, in this paper, we investigate on the existing labeling schemes and mapping approaches to gauge a better understanding in terms of the robustness of the labeling schemes and the importance of the mapping schemes. Next, we propose ORD-GAP labeling schemes to identify the structural relationship among XML nodes and yet, it is persistent to re-labeling when new nodes are inserted. Subsequently, a mapping scheme is proposed to transform XML into Relational Database (RDB). Preliminary experimental evaluation demonstrated that the proposed approach achieve 66% better as compared to ORDPATH, and 56% better as compared to ME labeling in terms of data loading time. </span>

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 907
Author(s):  
Su-Cheng Haw ◽  
Aisyah Amin ◽  
Chee-Onn Wong ◽  
Samini Subramaniam

Background: As the standard for the exchange of data over the World Wide Web, it is important to ensure that the eXtensible Markup Language (XML) database is capable of supporting not only efficient query processing but also capable of enduring frequent data update operations over the dynamic changes of Web content. Most of the existing XML annotation is based on a labeling scheme to identify each hierarchical position of the XML nodes. This computation is costly as any updates will cause the whole XML tree to be re-labelled. This impact can be observed on large datasets. Therefore, a robust labeling scheme that avoids re-labeling is crucial. Method: Here, we present ORD-GAP (named after Order Gap), a robust and persistent XML labeling scheme that supports dynamic updates. ORD-GAP assigns unique identifiers with gaps in-between XML nodes, which could easily identify the level, Parent-Child (P-C), Ancestor-Descendant (A-D) and sibling relationship. ORD-GAP adopts the OrdPath labeling scheme for any future insertion. Results: We demonstrate that ORD-GAP is robust enough for dynamic updates, and have implemented it in three use cases: (i) left-most, (ii) in-between and (iii) right-most insertion. Experimental evaluations on DBLP dataset demonstrated that ORD-GAP outperformed existing approaches such as ORDPath and ME Labeling concerning database storage size, data loading time and query retrieval. On average, ORD-GAP has the best storing and query retrieval time. Conclusion: The main contributions of this paper are: (i) A robust labeling scheme named ORD-GAP that assigns certain gap between each node to support future insertion, and (ii) An efficient mapping scheme, which built upon ORD-GAP labeling scheme to transform XML into RDB effectively.


2012 ◽  
Vol 10 (3) ◽  
pp. 13-26
Author(s):  
Xiaomin Zhu ◽  
Zhongxiang He ◽  
Shengbo Shi

Extensible Markup Language (XML) is a textual markup language which becomes more and more important in the Internet web service. However, some distinct disadvantages exist in XML, such as its nature of redundancy, which consumes the limited network’s bandwidth greatly especially in mobile computing. Considering the characteristics of the mobile commerce, the handsets’ memory capability and data processing time are two problems for XML being applied. This paper studies an enhancement of XML for the purpose of application in mobile e-commerce, called SXML, which means Simple XML to enhance the XML used in mobile web service. It helps XML producers minimizing the size effects of XML, e.g., the size overhead and slow implementation speed. Comprehensive simulations show that the SXML could reduce the size of XML documents and reduce the time of implementation, consequently utilize the bandwidth effectively.


Extensible Markup Language (XML) technology is widely used for data exchange and data representation in both online and offline mode. This structured format language able to be transformed into other formats and share information across platforms. XML is simple; however, it is designed to accommodate changes. For this paper, a study on transformation of XML document into relational database is conducted. Crucial part of this process is how to maintain the hierarchy and relationships between data in the document into database. Approaches that are discussed in this paper each uses own unique way of data storing technique and database design. Therefore, each algorithm is assessed with three datasets constitute of small, medium and large size XML file. The efficiency of the algorithms is being tested on time taken for data storing and query execution process. At the end of the evaluation, we discuss factors that affect algorithm performance and present suggestions to improve mapping scheme for future works


2015 ◽  
Vol 6 (4) ◽  
Author(s):  
Irvanizam Zamanhuri

Abstract. The eXtensible Markup Language (XML) has quickly become the de facto standard for data exchange via web. An XML document can be viewed as an ordered tree that has at least one node. Each node must be labeled by using a scheme approach to describe the XML data structure. There are two famous existing encodings, namely Dewey and Inteval Encodings. In this paper, ORDPATH encoding based on Dewey together with the two other encodings are empirically demontrated on dblp, nasa, and treebank datasets. The results show that while a new node was inserted into the tree, Dewey and Interval have to relabel the inserted node’s siblings and modify the interval number of the sibling nodes, respectively. Whereas, the ORDPATH eliminates this problem by adding an even number used as a caret for the new insertion node.Keywords: Ordered Tree, XML, ORDPATH. Abstrak. EXtensible Markup Language (XML) terus menjadi standard untuk penukaran data melalui web. Sebuah dokumen XML dapat ditinjau menjadi tree terurut yang berisikan sedikitnya satu node. Setiap node harus dilabelkan menggunakan sebuah algoritma pelabelan untuk mendeskripsikan struktur data XML tersebut. Ada dua algoritma encoding yang terkenal selama ini, Dewey dan Interval encoding. Pada tulisan ini, metode ORDPATH yang berbasiskan Dewey bersama-sama dengan Dewey dan Interval didemontrasikan secara empiris dengan menggunakan dataset dblp, nasa, dan treebank. Hasil menunjukkan bahwa ketika node baru dimasukkan ke dalam tree, Dewey dan Interval harus melakukan pelabelan kembali dan memodifikasi interval sibling node. Akan tetapi, ORDPATH dapat mengatasi masalah ini dengan memberikan angka genap yang digunakan sebagai penanda untuk node baru.Kata Kunci: Ordered Tree, XML, ORDPATH.


Author(s):  
K. C. Morris ◽  
Puja Goyal ◽  
Simon Frechette

In enterprise integration, a data exchange specification is an architectural artifact that evolves along with the business. Developing and maintaining a coherent semantic model for data exchange is an important, yet non-trivial, task. A coherent semantic model of data exchange specifications supports reuse, promotes interoperability, and, consequently, reduces integration costs. Components of data exchange specifications must be consistent and valid in terms of agreed upon standards and guidelines. In this paper, we describe an activity model and NIST developed tools for the creation, test, and maintenance of a shared semantic model that is coherent and supports scalable, standards-based enterprise integration. The activity model frames our research and helps define tools to support the development of data exchange specification implemented using XML (Extensible Markup Language) Schema.


Author(s):  
Huiping Cao ◽  
Yan Qi ◽  
K. Selçuk Candan ◽  
Maria Luisa Sapino

Many applications require exchange and integration of data from multiple, heterogeneous sources. eXtensible Markup Language (XML) is a standard developed to satisfy the convenient data exchange needs of these applications. However, XML by itself does not address the data integration requirements. This chapter discusses the challenges and techniques in XML Data Integration. It first presents a four step outline, illustrating the steps involved in the integration of XML data. This chapter, then, focuses on the first two of these steps: schema extraction and data/schema mapping. More specifically, schema extraction presents techniques to extract tree summaries, DTDs, or XML Schemas from XML documents. The discussion on data/schema mapping focuses on techniques for aligning XML data and schemas.


1999 ◽  
Vol 38 (03) ◽  
pp. 154-157
Author(s):  
W. Fierz ◽  
R. Grütter

AbstractWhen dealing with biological organisms, one has to take into account some peculiarities which significantly affect the representation of knowledge about them. These are complemented by the limitations in the representation of propositional knowledge, i. e. the majority of clinical knowledge, by artificial agents. Thus, the opportunities to automate the management of clinical knowledge are widely restricted to closed contexts and to procedural knowledge. Therefore, in dynamic and complex real-world settings such as health care provision to HIV-infected patients human and artificial agents must collaborate in order to optimize the time/quality antinomy of services provided. If applied to the implementation level, the overall requirement ensues that the language used to model clinical contexts should be both human- and machine-interpretable. The eXtensible Markup Language (XML), which is used to develop an electronic study form, is evaluated against this requirement, and its contribution to collaboration of human and artificial agents in the management of clinical knowledge is analyzed.


Sign in / Sign up

Export Citation Format

Share Document