Processing Arbitrarily Large XML using a Persistent DOM

Author(s):  
Martin Probst

As the adoption of XML reaches more and more application domains, data sizes increase, and efficient XML handling gets more and more important. Many applications face scalability problems due to the overhead of XML parsing, the difficulty of effectively finding particular XML nodes, or the sheer size of XML documents, which nowadays can easily exceed gigabytes of data. In particular the latter issue can make certain tasks seemingly impossible to handle, as many applications depend on parsing XML documents completely into a Document Object Model (DOM) memory structure. Parsing XML into a DOM typically requires close to or even more memory as the serialized XML would consume, thus making it prohibitively expensive to handle XML documents in the gigabyte range. Recent research and development suggests that it is possible to modify these applications to run a wide range of tasks in a streaming fashion, thus limiting the memory consumption of individual applications. However this requires not only changes in the underlying tools, but often also in user code, such as XSLT style sheets. These required changes can often be unintuitive and complicate user code. A different approach is to run applications against an efficient, persistent, hard-disk backed DOM implementation that does not require entire documents to be in memory at a time. This talk will discuss such a DOM implementation, EMC's xDB, showing how to use binary XML and efficient backend structures to provide a standards compliant, non-memory-backed, transactional DOM implementation, with little overhead compared to regular memory-based DOMs. It will also give performance comparisons and show how to run existing applications transparently against xDB's DOM implementation, using XSLT stylesheets as an example.

Author(s):  
Alain Couthures

Document object models, specifically the browser DOM, were designed to represent HTML and XML documents. Languages such as XPath were designed to access and traverse the DOM of HTML and XML documents. But suppose we wanted to bring the power and convenience of XML technologies like XPath to new data types. Could we extend the DOM to support CSV files? JSON? ZIP files? Yes we can! This paper explores a number of ways in which the DOM can be made to do more. We can loosen restrictions, describe new sequence types, and even define new XPath axes to make the DOM better and more useful.


2012 ◽  
Author(s):  
Ren Hui Gong ◽  
Ziv Yaniv

The Insight Segmentation and Registration Toolkit (ITK) previously provided a framework for parsing Extensible Markup Language (XML) documents using the Simple API for XML (SAX) framework. While this programming model is memory efficient, it places most of the implementation burden on the user. We provide an implementation of the Document Object Model (DOM) framework for parsing XML documents. Using this model, user code is greatly simplified, shifting most of the implementation burden from the user to the framework. The provided implementation consists of two tiers. The lower level tier provides functionality for parsing XML documents and loading the tree structure into memory. It then allows the user to query and retrieve specific entries. The upper tier uses this functionality to provide an interface for mimicking a serialization and de-serialization mechanism for ITK objects. The implementation described in this document was incorporated into ITK as part of release 4.2.


2021 ◽  
Vol 22 (5) ◽  
pp. 481-508
Author(s):  
Robert P. Carlyon ◽  
Tobias Goehring

AbstractCochlear implants (CIs) are the world’s most successful sensory prosthesis and have been the subject of intense research and development in recent decades. We critically review the progress in CI research, and its success in improving patient outcomes, from the turn of the century to the present day. The review focuses on the processing, stimulation, and audiological methods that have been used to try to improve speech perception by human CI listeners, and on fundamental new insights in the response of the auditory system to electrical stimulation. The introduction of directional microphones and of new noise reduction and pre-processing algorithms has produced robust and sometimes substantial improvements. Novel speech-processing algorithms, the use of current-focusing methods, and individualised (patient-by-patient) deactivation of subsets of electrodes have produced more modest improvements. We argue that incremental advances have and will continue to be made, that collectively these may substantially improve patient outcomes, but that the modest size of each individual advance will require greater attention to experimental design and power. We also briefly discuss the potential and limitations of promising technologies that are currently being developed in animal models, and suggest strategies for researchers to collectively maximise the potential of CIs to improve hearing in a wide range of listening situations.


2020 ◽  
Vol 3 (4) ◽  
pp. 257-264
Author(s):  
Catherine J Hutchings

Abstract Antibodies are now well established as therapeutics with many additional advantages over small molecules and peptides relative to their selectivity, bioavailability, half-life and effector function. Major classes of membrane-associated protein targets include G protein-coupled receptors (GPCRs) and ion channels that are linked to a wide range of disease indications across all therapeutic areas. This mini-review summarizes the antibody target landscape for both GPCRs and ion channels as well as current progress in the respective research and development pipelines with some example case studies highlighted from clinical studies, including those being evaluated for the treatment of symptoms in COVID-19 infection.


Sign in / Sign up

Export Citation Format

Share Document