Structure- and Content-Based Retrieval for XML Documents

Human Computer Interaction ◽

10.4018/978-1-878289-91-9.ch010 ◽

2011 ◽

pp. 153-166

Author(s):

Jae Woo Chang ◽

Du-Seok Jin

Keyword(s):

Retrieval System ◽

Document Retrieval ◽

Basic Element ◽

Index Structure ◽

Content Based Retrieval ◽

Xml Documents ◽

Storage Overhead ◽

Xml Document ◽

And Storage ◽

System Effectiveness

As the number of XML documents is dramatically increasing, it is necessary to develop an XML document retrieval system that can support both structure-based retrieval and content-based retrieval. In order to support the structure-based retrieval, we design four efficient index structures, i.e., keyword, structure, element and attribute index, by indexing XML documents based on a basic element unit. In order to support the content-based retrieval, we design a high-dimensional index structure based on the X-tree so as to store and retrieve both color and shape feature vectors efficiently. Finally, we do the performance evaluation of our XML document retrieval system in terms of system efficiency, such as retrieval time, insertion time, and storage overhead, as well as system effectiveness, such as recall and precision measures.

Download Full-text

The Foundations of XML and WSDL

Building and Managing Enterprise-Wide Portals ◽

10.4018/978-1-59140-661-7.ch002 ◽

2011 ◽

pp. 7-31

Author(s):

Jana Polgar ◽

Robert Mark Braum ◽

Tony Polgar

Keyword(s):

Web Service ◽

Development Stage ◽

Xml Schema ◽

Document Structure ◽

Xml Documents ◽

Type Definition ◽

Xml Document ◽

Extensible Markup ◽

Basic Concepts ◽

And Performance

XML stands for Extensible Markup Language (http://www.w3.org/XML/), and it has been adopted by industry for exchanging data in a platform, language, and protocol independent fashion. While XML has many benefits during the development stage, it has some performance disadvantages. This chapter provides a quick look at the following topics: 1. Overview of the standard and basic concepts; 2. Basic XML document structure; 3. Information about usage of Document Type Definition (DTD); 4. Structure and usage of XML Schema; and 5. Discussion about the design and performance issues when using XML documents with Web service.

Download Full-text

Automatic Mapping of XML Documents into Relational Database

Advances in Data Mining and Database Management - Design, Performance, and Analysis of Innovative Information Retrieval ◽

10.4018/978-1-4666-1975-3.ch013 ◽

2013 ◽

pp. 180-186

Author(s):

Ibrahim Dweib ◽

Joan Lu

Keyword(s):

Relational Database ◽

Execution Time ◽

Database Systems ◽

Document Structure ◽

Xml Documents ◽

Xml Document ◽

Extensible Markup ◽

Mapping Process ◽

Automatic Mapping ◽

Relational Database Systems

Extensible Markup Language (XML) nowadays is one of the most important standard media used for exchanging and representing data through the Internet. Storing, updating, and retrieving the huge amount of web services data such as XML is an attractive area of research for researchers and database vendors. In this chapter, the authors propose and develop a new mapping model, called MAXDOR, for storing, rebuilding, updating, and querying XML documents using a relational database without making use of any XML schemas in the mapping process. The model addressed the problem of solving the structural hole between ordered hierarchical XML and unordered tabular relational database to enable us to use relational database systems for storing, updating, and querying XML data. A multiple link list is used to maintain XML document structure, manage the process of updating document contents, and retrieve document contents efficiently. Experiments are done to evaluate MAXDOR model. MAXDOR will be compared with other well-known models available in the literature (Tatarinov et al., 2002) and (Torsten et al., 2004) using total expected value of rebuilding XML document execution time and insertion of token execution time.

Download Full-text

XML Schema Integration and E-Commerce

Electronic Commerce ◽

10.4018/978-1-59904-943-4.ch026 ◽

2011 ◽

pp. 286-291

Author(s):

Kalpdrum Passi ◽

Louise Lane ◽

Sanjay Madria ◽

Mukesh Mohania

Keyword(s):

Incomplete Data ◽

Xml Schema ◽

Structured Data ◽

Markup Language ◽

Schema Integration ◽

Semantic Meaning ◽

Data Set ◽

Xml Documents ◽

Xml Document ◽

Extensible Markup

XML (eXtensible Markup Language) is used to describe semi-structured data, i.e., irregular or incomplete data whose structure may be subject to unpredictable changes. Unlike traditional semi-structured data, XML documents are self-describing, thus XML provides a platform-independent means to describe data and, therefore, can transport data from one platform to another (Bray, Paoli, & Sperberg-McQueen, 1998). XML documents can be both created and used by applications. The valid content, allowed structure, and metadata properties of XML documents are described by their related schema(s) (Thompson, Beech, Maloney, & Mendelsohn, 2001). An XML document is said to be valid if it conforms to its related schema. A schema also gives additional semantic meaning to the data it is used to tag. The schema is provided independently of the data it describes. Any given data set may rely on multiple schemas for validation. Any given schema may itself refer to multiple schemas.

Download Full-text

XML Schema Integration and E-Commerce

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch555 ◽

2005 ◽

pp. 3118-3121

Author(s):

Kalpdrum Passi ◽

Louise Lane ◽

Sanjay Madria ◽

Mukesh Mohania

Keyword(s):

Incomplete Data ◽

Xml Schema ◽

Structured Data ◽

Markup Language ◽

Schema Integration ◽

Semantic Meaning ◽

Data Set ◽

Xml Documents ◽

Xml Document ◽

Extensible Markup

XML (eXtensible Markup Language) is used to describe semi-structured data, i.e., irregular or incomplete data whose structure may be subject to unpredictable changes. Unlike traditional semi-structured data, XML documents are self-describing, thus XML provides a platform-independent means to describe data and, therefore, can transport data from one platform to another (Bray, Paoli, & Sperberg-McQueen, 1998). XML documents can be both created and used by applications. The valid content, allowed structure, and metadata properties of XML documents are described by their related schema(s) (Thompson, Beech, Maloney, & Mendelsohn, 2001). An XML document is said to be valid if it conforms to its related schema. A schema also gives additional semantic meaning to the data it is used to tag. The schema is provided independently of the data it describes. Any given data set may rely on multiple schemas for validation. Any given schema may itself refer to multiple schemas.

Download Full-text

Discovering XML Conditional Dependencies for Data Quality Issues

European Journal of Electrical Engineering and Computer Science ◽

10.24018/ejece.2020.4.1.156 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Mohammed Ragheb Hakawati ◽

Yasmin Yacob ◽

Amiza Amir ◽

Jabiry M. Mohammed ◽

Khalid Jamal Jadaa

Keyword(s):

Data Quality ◽

Primary Standard ◽

Markup Language ◽

Document Type ◽

Data Dependencies ◽

Master Data ◽

Xml Document ◽

Extensible Markup ◽

Quality Issues ◽

Mining Algorithms

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.

Download Full-text

Abstract DTD Graph from an XML Document

Advances in Database Research - Principle Advancements in Database Management Technologies ◽

10.4018/978-1-60566-904-5.ch010 ◽

2010 ◽

pp. 204-224

Author(s):

Joseph Fong ◽

Herbert Shiu

Keyword(s):

Relational Database ◽

The Internet ◽

Conceptual Schema ◽

Reverse Engineer ◽

Xml Documents ◽

Data Interchange ◽

Data Semantics ◽

Xml Document ◽

Extensible Markup ◽

Implicit And Explicit

Extensible Markup Language (XML) has become a standard for persistent storage and data interchange via the Internet due to its openness, self-descriptiveness and flexibility. This chapter proposes a systematic approach to reverse engineer arbitrary XML documents to their conceptual schema – Extended DTD Graphs ? which is a DTD Graph with data semantics. The proposed approach not only determines the structure of the XML document, but also derives candidate data semantics from the XML element instances by treating each XML element instance as a record in a table of a relational database. One application of the determined data semantics is to verify the linkages among elements. Implicit and explicit referential linkages are among XML elements modeled by the parent-children structure and ID/IDREF(S) respectively. As a result, an arbitrary XML document can be reverse engineered into its conceptual schema in an Extended DTD Graph format.

Download Full-text

On an Enhancement of XML Applied for Mobile E-Commerce

Journal of Electronic Commerce in Organizations ◽

10.4018/jeco.2012070102 ◽

2012 ◽

Vol 10 (3) ◽

pp. 13-26

Author(s):

Xiaomin Zhu ◽

Zhongxiang He ◽

Shengbo Shi

Keyword(s):

Data Processing ◽

Mobile Computing ◽

Web Service ◽

Size Effects ◽

Processing Time ◽

The Internet ◽

Markup Language ◽

Mobile Web ◽

Xml Documents ◽

Extensible Markup

Extensible Markup Language (XML) is a textual markup language which becomes more and more important in the Internet web service. However, some distinct disadvantages exist in XML, such as its nature of redundancy, which consumes the limited network’s bandwidth greatly especially in mobile computing. Considering the characteristics of the mobile commerce, the handsets’ memory capability and data processing time are two problems for XML being applied. This paper studies an enhancement of XML for the purpose of application in mobile e-commerce, called SXML, which means Simple XML to enhance the XML used in mobile web service. It helps XML producers minimizing the size effects of XML, e.g., the size overhead and slow implementation speed. Comprehensive simulations show that the SXML could reduce the size of XML documents and reduce the time of implementation, consequently utilize the bandwidth effectively.

Download Full-text

Mining Association Rules from XML Documents

Enterprise Information Systems ◽

10.4018/978-1-61692-852-0.ch321 ◽

2011 ◽

pp. 879-899

Author(s):

Laura Irina Rusu ◽

Wenny Rahayu ◽

David Taniar

Keyword(s):

Knowledge Discovery ◽

Association Rules ◽

Web Application ◽

Semistructured Data ◽

Markup Language ◽

Xml Documents ◽

Rapid Changes ◽

Extensible Markup ◽

Hidden Knowledge ◽

The Web

This chapter presents some of the existing mining techniques for extracting association rules out of XML documents in the context of rapid changes in the Web knowledge discovery area. The initiative of this study was driven by the fast emergence of XML (eXtensible Markup Language) as a standard language for representing semistructured data and as a new standard of exchanging information between different applications. The data exchanged as XML documents become richer and richer every day, so the necessity to not only store these large volumes of XML data for later use, but to mine them as well to discover interesting information has became obvious. The hidden knowledge can be used in various ways, for example, to decide on a business issue or to make predictions about future e-customer behaviour in a Web application. One type of knowledge that can be discovered in a collection of XML documents relates to association rules between parts of the document, and this chapter presents some of the top techniques for extracting them.

Download Full-text

Querying XML documents in logic programming

Theory and Practice of Logic Programming ◽

10.1017/s1471068407003183 ◽

2008 ◽

Vol 8 (3) ◽

pp. 323-361 ◽

Cited By ~ 12

Author(s):

J. M. ALMENDROS-JIMÉNEZ ◽

A. BECERRA-TERÓN ◽

F. J. ENCISO-BAÑOS

Keyword(s):

Logic Programming ◽

Large Scale ◽

Query Language ◽

Electronic Publishing ◽

Logic Program ◽

Main Memory ◽

Secondary Memory ◽

Xml Documents ◽

Xml Document ◽

Extensible Markup

AbstractExtensible Markup Language (XML) is a simple, very flexible text format derived from SGML. Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. XPath language is the result of an effort to provide address parts of an XML document. In support of this primary purpose, it becomes in a query language against an XML document. In this paper we present a proposal for the implementation of the XPath language in logic programming. With this aim we will describe the representation of XML documents by means of a logic program. Rules and facts can be used for representing the document schema and the XML document itself. In particular, we will present how to index XML documents in logic programs: rules are supposed to be stored in main memory, however facts are stored in secondary memory by using two kind of indexes: one for each XML tag, and other for each group of terminal items. In addition, we will study how to query by means of the XPath language against a logic program representing an XML document. It evolves the specialization of the logic program with regard to the XPath expression. Finally, we will also explain how to combine the indexing and the top-down evaluation of the logic program.

Download Full-text