scholarly journals XML Dataset and Benchmarks for Performance Testing of the CLS Labelling Scheme

2021 ◽  
Vol 20 (2) ◽  
pp. 12-15
Author(s):  
Alhadi A. Klaib

Extensible Markup Language (XML) has become a significant technology for transferring data through the world of the Internet. XML labelling schemes are an essential technique used to handle XML data effectively. Labelling XML data is performed by assigning labels to all nodes in that XML document. CLS labelling scheme is a hybrid labelling scheme that was developed to address some limitations of indexing XML data.  Moreover, datasets are used to test XML labelling schemes. There are many XML datasets available nowadays. Some of them are from real life datasets and others are from artificial datasets. These datasets and benchmarks are used for testing the XML labelling schemes. This paper discusses and considers these datasets and benchmarks and their specifications in order to determine the most appropriate one for testing the CLS labelling scheme. This research found out that the XMark benchmark is the most appropriate choice for the testing performance of the CLS labelling scheme. 

2015 ◽  
Vol 6 (4) ◽  
Author(s):  
Irvanizam Zamanhuri

Abstract. The eXtensible Markup Language (XML) has quickly become the de facto standard for data exchange via web. An XML document can be viewed as an ordered tree that has at least one node. Each node must be labeled by using a scheme approach to describe the XML data structure. There are two famous existing encodings, namely Dewey and Inteval Encodings. In this paper, ORDPATH encoding based on Dewey together with the two other encodings are empirically demontrated on dblp, nasa, and treebank datasets. The results show that while a new node was inserted into the tree, Dewey and Interval have to relabel the inserted node’s siblings and modify the interval number of the sibling nodes, respectively. Whereas, the ORDPATH eliminates this problem by adding an even number used as a caret for the new insertion node.Keywords: Ordered Tree, XML, ORDPATH. Abstrak. EXtensible Markup Language (XML) terus menjadi standard untuk penukaran data melalui web. Sebuah dokumen XML dapat ditinjau menjadi tree terurut yang berisikan sedikitnya satu node. Setiap node harus dilabelkan menggunakan sebuah algoritma pelabelan untuk mendeskripsikan struktur data XML tersebut. Ada dua algoritma encoding yang terkenal selama ini, Dewey dan Interval encoding. Pada tulisan ini, metode ORDPATH yang berbasiskan Dewey bersama-sama dengan Dewey dan Interval didemontrasikan secara empiris dengan menggunakan dataset dblp, nasa, dan treebank. Hasil menunjukkan bahwa ketika node baru dimasukkan ke dalam tree, Dewey dan Interval harus melakukan pelabelan kembali dan memodifikasi interval sibling node. Akan tetapi, ORDPATH dapat mengatasi masalah ini dengan memberikan angka genap yang digunakan sebagai penanda untuk node baru.Kata Kunci: Ordered Tree, XML, ORDPATH.


Author(s):  
Mohammed Ragheb Hakawati ◽  
Yasmin Yacob ◽  
Amiza Amir ◽  
Jabiry M. Mohammed ◽  
Khalid Jamal Jadaa

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.


1998 ◽  
Vol 54 (6) ◽  
pp. 1065-1070 ◽  
Author(s):  
Peter Murray-Rust

The rapid growth of the World Wide Web provides major new opportunities for distributed databases, especially in macromolecular science. A new generation of technology, based on structured documents (SD), is being developed which will integrate documents and data in a seamless manner. This offers experimentalists the chance to publish and archive high-quality data from any discipline. Data and documents from different disciplines can be combined and searched using technology such as eXtensible Markup Language (XML) and its associated support for hypermedia (XLL), metadata (RDF) and stylesheets (XSL). Opportunities in crystallography and related disciplines are described.


2000 ◽  
Vol 4 (1) ◽  
pp. 47-50
Author(s):  
Rick Elam ◽  
Zabihollah Rezaee

The purpose of this article is to describe the shift of business-to-business trading from Electronic Data Interchange (EDI) to extranets and to discuss some of the internal con-trol challenges created by extranets and the eXtensible Markup Language (XML). This technology raises internal control issues because extranets use the World Wide Web to communicate and because XML is such a powerful and flexible programming language.


2021 ◽  
Vol 10 (6) ◽  
pp. 3256-3264
Author(s):  
Su-Cheng Haw ◽  
Emyliana Song

eXtensible markup language (XML) appeared internationally as the format for data representation over the web. Yet, most organizations are still utilising relational databases as their database solutions. As such, it is crucial to provide seamless integration via effective transformation between these database infrastructures. In this paper, we propose XML-REG to bridge these two technologies based on node-based and path-based approaches. The node-based approach is good to annotate each positional node uniquely, while the path-based approach provides summarised path information to join the nodes. On top of that, a new range labelling is also proposed to annotate nodes uniquely by ensuring the structural relationships are maintained between nodes. If a new node is to be added to the document, re-labelling is not required as the new label will be assigned to the node via the new proposed labelling scheme. Experimental evaluations indicated that the performance of XML-REG exceeded XMap, XRecursive, XAncestor and Mini-XML concerning storing time, query retrieval time and scalability. This research produces a core framework for XML to relational databases (RDB) mapping, which could be adopted in various industries.


2012 ◽  
Vol 433-440 ◽  
pp. 6509-6513
Author(s):  
Li Min Cai

This paper Introduces XML (Extensible Markup Language), describes the Remote Temperature Monitoring System program. Samsung S3C2440 Microprocessor as the core of this system, Embedded Linux System and web server are transplanted, accomplish the on-site collection of temperature by the digital temperature sensor DS18B20, acquired data is saved in XML document, on-site real-time temperature can be displayed on a browser by a remote end. The results of actual runs show the effectiveness.


2011 ◽  
pp. 234-253 ◽  
Author(s):  
Steffen Staab ◽  
Michael Erdmann ◽  
Alexander Maedche ◽  
Stefan Decker

The development of the World Wide Web is about to mature from a technical platform that allows for the transportation of information from sources to humans (albeit in many syntactic formats) to the communication of knowledge from Web sources to machines. The knowledge food chain has started with technical protocols and preliminary formats for information presentation (HTML–HyperText Markup Language) over a general methodology for separating information contents from layout (XML–eXtensible Markup Language, XSL–eXtensible Stylesheet Language) to reach the realms of knowledge provisioning by the means of RDF and RDFS.


Author(s):  
Hadj Mahboubi

With the eXtensible Markup Language (XML) becoming a standard for representing business data (Beyer et al., 2005), a new trend toward XML data warehousing has been emerging for a couple of years, as well as efforts for extending the XQuery language with near On-Line Analytical Processing (OLAP) capabilities (grouping, aggregation, etc.). Though this is not an easy task, these new approaches, techniques and architectures aim at taking specificities of XML into account (e.g., heterogeneous number and order of dimensions or complex measures in facts, ragged dimension hierarchies…) that would be intricate to handle in a relational environment. The aim of this article is to present an overview of the major XML warehousing approaches from the literature, as well as the existing approaches for performing OLAP analyses over XML data (which is termed XML-OLAP or XOLAP; Wang et al., 2005). We also discuss the issues and future trends in this area and illustrate this topic by presenting the design of a unified, XML data warehouse architecture and a set of XOLAP operators expressed in an XML algebra.


Author(s):  
Tanja Toroi ◽  
Anne Eerola

Interoperability of software systems is a critical, ever-increasing requirement in software industry. Conformance testing is needed to assure conformance of software and interfaces to standards and other specifications. In this chapter we shortly refer to what has been done in conformance testing around the world and in Finland. Also, testability requirements for the specifications utilized in conformance testing are proposed and test-case derivation from different kinds of specifications is examined. Furthermore, we present a conformance-testing environment for the healthcare domain, developed in an OpenTE project, consisting of different service-specific and shared testing services. In our testing environment testing is performed against open interfaces, and test cases can, for example, be in XML (extensible markup language) or CDA R2 (clinical document architecture, Release 2) form.


Author(s):  
Michael Lang

Although its conceptual origins can be traced back a few decades (Bush, 1945), it is only recently that hypermedia has become popularized, principally through its ubiquitous incarnation as the World Wide Web (WWW). In its earlier forms, the Web could only properly be regarded a primitive, constrained hypermedia implementation (Bieber & Vitali, 1997). Through the emergence in recent years of standards such as eXtensible Markup Language (XML), XLink, Document Object Model (DOM), Synchronized Multimedia Integration Language (SMIL) and WebDAV, as well as additional functionality provided by the Common Gateway Interface (CGI), Java, plug-ins and middleware applications, the Web is now moving closer to an idealized hypermedia environment. Of course, not all hypermedia systems are Web based, nor can all Web-based systems be classified as hypermedia (see Figure 1). See the terms and definitions at the end of this article for clarification of intended meanings. The focus here shall be on hypermedia systems that are delivered and used via the platform of the WWW; that is, Web-based hypermedia systems.


Sign in / Sign up

Export Citation Format

Share Document