A XML Document Coding Schema Based on Binary

2014 ◽  
Vol 496-500 ◽  
pp. 1877-1880
Author(s):  
Dong Juan Gu ◽  
Li Yong Wan

In order to resolve the inefficiency for XML data query and support dynamic updates, etc, this paper has proposed an improved method to encode XML document nodes. On the basic of region encoding and the prefix encoding, it introduces a XML document coding schema base on binary (CSBB). The CSBB code use binary encoding strategy and make the bit string inserted in order. The bit string inserted algorithm can generate ordered bit string to reserve space for the inserted new nodes, and not influence on the others. Experiments shows the CSBB code can effectively avoid re-encoding of nodes, and supports the nodes Dynamic Update.

2014 ◽  
Vol 556-562 ◽  
pp. 3347-3349
Author(s):  
Yao Wen Xia ◽  
Ji Li Xie

In this paper, from the perspective of XML data management, first in the HDFS store large amount of data and XML data based on XML data query rewrite the traditional framework of MapReduce process, the design of large amount of data XML data set keywords retrieval algorithm, contain XML data classification and coding, index and search a four parts, solve the large amount of data of the XML document keywords retrieval problem. Then the design and implementation based on MapReduce of large amount of data XML keyword query system.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Yue Zhao ◽  
Ye Yuan ◽  
Guoren Wang

This paper describes a keyword search measure on probabilistic XML data based on ELM (extreme learning machine). We use this method to carry out keyword search on probabilistic XML data. A probabilistic XML document differs from a traditional XML document to realize keyword search in the consideration of possible world semantics. A probabilistic XML document can be seen as a set of nodes consisting of ordinary nodes and distributional nodes. ELM has good performance in text classification applications. As the typical semistructured data; the label of XML data possesses the function of definition itself. Label and context of the node can be seen as the text data of this node. ELM offers significant advantages such as fast learning speed, ease of implementation, and effective node classification. Set intersection can compute SLCA quickly in the node sets which is classified by using ELM. In this paper, we adopt ELM to classify nodes and compute probability. We propose two algorithms that are based on ELM and probability threshold to improve the overall performance. The experimental results verify the benefits of our methods according to various evaluation metrics.


For the ability to represent data from a wide variety of sources, XML is rapidly emerging as the new standard for data representation and exchange on Web and e-government. To effectively use XML data in practice, entity resolution, which has been proven extremely useful in data fusion, inconsistency detection, and data repairing, must be in place to improve the quality of the XML data. In this chapter, the authors deal specifically with object identification on XML data, the application of which includes XML document management in highly dynamic applications like the Web and peer-to-peer systems, detection of duplicate elements in nested XML data, and finding similar identities among objects from multiple Web sources. The authors survey techniques of pairwise and groupwise entity resolution for XML data, which adopt structured information to describe the similarity or distance of XML data, like XML document and XML elements in document, and find the matching pairs which describe same object or classify them into separate groups, each group corresponding to the same object in real world. There are a lot of ways to describe the XML structure and content, such as a tree, Bayesian network, and set. The authors introduce some well-known algorithm base on these structures to solve matching XML data problems. Finally, the authors discuss directions for future research.


2014 ◽  
Vol 1044-1045 ◽  
pp. 995-998
Author(s):  
Jia Ying

The article analyzed the shortage of P_schema, and brought forward an improved method P_schema++,.Nesting structure.multi_citing element, alternative element was picked up to format a new type, and then to a relation table. P_Schema++ provided a method for storage of Complex XML document in RDB.


2015 ◽  
Vol 6 (4) ◽  
Author(s):  
Irvanizam Zamanhuri

Abstract. The eXtensible Markup Language (XML) has quickly become the de facto standard for data exchange via web. An XML document can be viewed as an ordered tree that has at least one node. Each node must be labeled by using a scheme approach to describe the XML data structure. There are two famous existing encodings, namely Dewey and Inteval Encodings. In this paper, ORDPATH encoding based on Dewey together with the two other encodings are empirically demontrated on dblp, nasa, and treebank datasets. The results show that while a new node was inserted into the tree, Dewey and Interval have to relabel the inserted node’s siblings and modify the interval number of the sibling nodes, respectively. Whereas, the ORDPATH eliminates this problem by adding an even number used as a caret for the new insertion node.Keywords: Ordered Tree, XML, ORDPATH. Abstrak. EXtensible Markup Language (XML) terus menjadi standard untuk penukaran data melalui web. Sebuah dokumen XML dapat ditinjau menjadi tree terurut yang berisikan sedikitnya satu node. Setiap node harus dilabelkan menggunakan sebuah algoritma pelabelan untuk mendeskripsikan struktur data XML tersebut. Ada dua algoritma encoding yang terkenal selama ini, Dewey dan Interval encoding. Pada tulisan ini, metode ORDPATH yang berbasiskan Dewey bersama-sama dengan Dewey dan Interval didemontrasikan secara empiris dengan menggunakan dataset dblp, nasa, dan treebank. Hasil menunjukkan bahwa ketika node baru dimasukkan ke dalam tree, Dewey dan Interval harus melakukan pelabelan kembali dan memodifikasi interval sibling node. Akan tetapi, ORDPATH dapat mengatasi masalah ini dengan memberikan angka genap yang digunakan sebagai penanda untuk node baru.Kata Kunci: Ordered Tree, XML, ORDPATH.


Author(s):  
Szabolcs Payrits ◽  
Péter Dornbach ◽  
István Zólyomi

Mapping XML document schemas and Web Service interfaces to programming languages has an important role in effective creation of quality Web Service implementations. The authors present a novel way to map XML data to the C++ programming language. The proposed solution offers more flexibility and more compact code that makes it ideal for embedded environments. The article describes the concept and the architecture of the solution and compares it with existing solutions. This article is an extended version of the paper from ICWS 2006. The authors include a broader comparison with existing tools on Symbian and Linux platforms and evaluate the code size and performance.


Sign in / Sign up

Export Citation Format

Share Document