scholarly journals Static Analysis for Event-Based XML Processing

2008 ◽  
Vol 15 (1) ◽  
Author(s):  
Anders Møller

<p>Event-based processing of XML data - as exemplified by the popular SAX framework - is a powerful alternative to using W3C's DOM or similar tree-based APIs. The event-based approach is particularly superior when processing large XML documents in a streaming fashion with minimal memory consumption.</p><p>This paper discusses challenges for creating program analyses for SAX applications. In particular, we consider the problem of statically guaranteeing that a given SAX program always produces only well-formed and valid XML output. We propose an analysis technique based on existing analyses of Servlets, string operations, and XML graphs.</p>

2006 ◽  
Vol 13 (16) ◽  
Author(s):  
Anders Møller

Event-based processing of XML data - as exemplified by the popular SAX framework - is a powerful alternative to using W3C's DOM or similar tree-based APIs. The event-based approach is particularly superior when processing large XML documents in a streaming fashion with minimal memory consumption.<br /> <br />This paper discusses challenges and presents some considerations for creating program analyses for SAX applications. In particular, we consider the problem of statically guaranteeing that a given SAX application always produces only well-formed and valid XML output.


2004 ◽  
Vol 11 (33) ◽  
Author(s):  
Aske Simon Christensen ◽  
Christian Kirkegaard ◽  
Anders Møller

We show that it is possible to extend a general-purpose programming language with a convenient high-level data-type for manipulating XML documents while permitting (1) precise static analysis for guaranteeing validity of the constructed XML documents relative to the given DTD schemas, and (2) a runtime system where the operations can be performed efficiently. The system, named Xact, is based on a notion of immutable XML templates and uses XPath for deconstructing documents. A companion paper presents the program analysis; this paper focuses on the efficient runtime representation.


Author(s):  
Alessandro Campi

This Chapter describes a visual framework; called XQBE; that covers the most important aspects of XML data management; spanning the visualization of XML documents; the formulation of queries; the representation and specification of document schemata; the definition of integrity constraints; the formulation of updates; and the expression of reactive behaviors in response to data modifications. All these features are strongly unified by a common visual abstraction and a few recurrent paradigms; so as to provide a homogeneous and comprehensive environment that allows even users without advanced programming skills to deal with nontrivial XML data management and transformation tasks. The intrinsic ambiguity inherent in any visual representation of richly expressive languages required a considerable effort of formalization in the semantics of XQBE that eventually lead to a solution with major advantages in terms of intuitiveness. In other words; this means that the unique (and unambiguous) effect of a statement is the one the user would expect.


2021 ◽  
Vol 37 ◽  
pp. 301139
Author(s):  
Nitin Naik ◽  
Paul Jenkins ◽  
Nick Savage ◽  
Longzhi Yang ◽  
Tossapon Boongoen ◽  
...  

Author(s):  
Martin Probst

As the adoption of XML reaches more and more application domains, data sizes increase, and efficient XML handling gets more and more important. Many applications face scalability problems due to the overhead of XML parsing, the difficulty of effectively finding particular XML nodes, or the sheer size of XML documents, which nowadays can easily exceed gigabytes of data. In particular the latter issue can make certain tasks seemingly impossible to handle, as many applications depend on parsing XML documents completely into a Document Object Model (DOM) memory structure. Parsing XML into a DOM typically requires close to or even more memory as the serialized XML would consume, thus making it prohibitively expensive to handle XML documents in the gigabyte range. Recent research and development suggests that it is possible to modify these applications to run a wide range of tasks in a streaming fashion, thus limiting the memory consumption of individual applications. However this requires not only changes in the underlying tools, but often also in user code, such as XSLT style sheets. These required changes can often be unintuitive and complicate user code. A different approach is to run applications against an efficient, persistent, hard-disk backed DOM implementation that does not require entire documents to be in memory at a time. This talk will discuss such a DOM implementation, EMC's xDB, showing how to use binary XML and efficient backend structures to provide a standards compliant, non-memory-backed, transactional DOM implementation, with little overhead compared to regular memory-based DOMs. It will also give performance comparisons and show how to run existing applications transparently against xDB's DOM implementation, using XSLT stylesheets as an example.


2019 ◽  
Vol 18 (04) ◽  
pp. 1950048
Author(s):  
Amjad Qtaish ◽  
Mohammad T. Alshammari

Extensible Markup Language (XML) has become a common language for data interchange and data representation in the Web. The evolution of the big data environment and the large volume of data which is being represented by XML on the Web increase the challenges in effectively managing such data in terms of storing and querying. Numerous solutions have been introduced to store and query XML data, including the file systems, Object-Oriented Database (OODB), Native XML Database (NXD), and Relational Database (RDB). Previous research attempts indicate that RDB is the most powerful technology for managing XML data to date. Because of the structure variations of XML and RDB, the need to map XML data to an RDB scheme is increased. This growth has prompted numerous researchers and database vendors to propose different approaches to map XML documents to an RDB, translating different types of XPath queries to SQL queries and returning the results to an XML format. This paper aims to comprehensively review most cited and latest mapping approaches and database vendors that use RDB solution to store and query XML documents, in a narrative manner. The advantages and the drawbacks of each approach is discussed, particularly in terms of storing and querying. The paper also provides some insight into managing XML documents using RDB solution in terms of storing and querying and contributes to the XML community.


Author(s):  
Pasquale De Meo ◽  
Giacomo Fiumara ◽  
Antonino Nocera ◽  
Domenico Ursino

In recent years, there has been an increase in the volume and heterogeneity of XML data sources. Moreover, these information sources are often comprised of both schemas and instances of XML data. In this context, the need of grouping similar XML documents together has led to an increasing research on clustering algorithms for XML data. In this chapter, we present an overview of the most popular methods for clustering XML data sources, distinguishing between the intensional data level and the extensional data level, depending whether the sources to cluster are DTDs and XML schemas, or XML documents; in the latter case, we focus on the structural information of the documents. We classify and describe techniques for computing similarities among XML data sources, and discuss methods for clustering DTDs/XML schemas and XML documents.


Author(s):  
Ibrahim Dweib ◽  
Joan Lu

In this chapter, the authors characterize a new model for mapping XML documents into relational database. The model examines the problem of solving the structural hole between ordered hierarchical XML and unordered tabular relational database to enable use of the relational database systems for storing, updating, and querying XML data. The authors introduce and implement a mapping system called MAXDOR to solve the problem.


2006 ◽  
Vol 22 (01) ◽  
pp. 9-14
Author(s):  
Wen-Yen Chien ◽  
Heiu-Jou Shaw ◽  
Shen-Ming Wang

In this article, Taiwan Ship Net is proposed to be a portal Web site of the Taiwan yachtindustry. By using Microsoft Distributed interNet Application (DNA) and Active ServicePage (ASP), the supply chain management system, consisting of three sections, shipyards, suppliers, and administrator, exchanges and transfers the information ofenterprise's system. Through applying XML to conduct the information exchange, weintegrate the formats of orders for several yacht manufacturers and suppliers, develop the system of placing orders online, and edit XML documents to exchange the information in the system. E-catalogs that are easy to preserve and update will replace the traditional paper catalogs hereafter. The various demands of accessories will result in an increase in the costs of production and logistics. Through this system, yacht manufacturers can search the material of E-catalogs and purchase online directly. Meanwhile, it will allow the suppliers to update the E-catalogs regularly to be more competitive in the market. Taiwan Ship Net can provide the newest information to foreign customers and communicate with all yacht companies, equipment suppliers, and manufacturers in Taiwan.


Sign in / Sign up

Export Citation Format

Share Document