Mining Association Rules from XML Documents

2011 ◽  
pp. 879-899
Author(s):  
Laura Irina Rusu ◽  
Wenny Rahayu ◽  
David Taniar

This chapter presents some of the existing mining techniques for extracting association rules out of XML documents in the context of rapid changes in the Web knowledge discovery area. The initiative of this study was driven by the fast emergence of XML (eXtensible Markup Language) as a standard language for representing semistructured data and as a new standard of exchanging information between different applications. The data exchanged as XML documents become richer and richer every day, so the necessity to not only store these large volumes of XML data for later use, but to mine them as well to discover interesting information has became obvious. The hidden knowledge can be used in various ways, for example, to decide on a business issue or to make predictions about future e-customer behaviour in a Web application. One type of knowledge that can be discovered in a collection of XML documents relates to association rules between parts of the document, and this chapter presents some of the top techniques for extracting them.

Author(s):  
Laura Irina Rusu ◽  
Wenny Rahayu ◽  
David Taniar

This chapter presents some of the existing mining techniques for extracting association rules out of XML documents in the context of rapid changes in the Web knowledge discovery area. The initiative of this study was driven by the fast emergence of XML (eXtensible Markup Language) as a standard language for representing semistructured data and as a new standard of exchanging information between different applications. The data exchanged as XML documents become richer and richer every day, so the necessity to not only store these large volumes of XML data for later use, but to mine them as well to discover interesting information has became obvious. The hidden knowledge can be used in various ways, for example, to decide on a business issue or to make predictions about future e-customer behaviour in a Web application. One type of knowledge that can be discovered in a collection of XML documents relates to association rules between parts of the document, and this chapter presents some of the top techniques for extracting them.


2007 ◽  
pp. 79-103 ◽  
Author(s):  
Laura Irina Rusu ◽  
Wenny Rahayu ◽  
David Taniar

This chapter presents some of the existing mining techniques for extracting association rules out of XML documents, in the context of rapid changes in the Web knowledge discovery area. The initiative of this study was driven by the fast emergence of XML (eXtensible Markup Language) as a standard language for representing semi-structured data and as a new standard of exchanging information between different applications. The data exchanged as XML documents becomes every day richer and richer, so the necessity to not only store these large volume of XML data for later use, but to mine them as well, to discover interesting information, has became obvious. The hidden knowledge can be used in various ways, for example to decide on a business issue or to make predictions about future e-customer behaviour in a web-application. One type of knowledge which can be discovered in a collection of XML documents relates to association rules between parts of the document, and this chapter presents some of the top techniques for extracting them.


Author(s):  
Mohammad Moradi ◽  
MohammadReza Keyvanpour

Since the early days of introducing eXtensible Markup Language (XML), owing to its expressive capabilities and flexibilities, it became the defacto standard for representing, storing, and interchanging data on the Web. Such features have made XML one of the building blocks of the Semantic Web. From another viewpoint, since XML documents could be considered from content, structural, and semantic aspects, leveraging their semantics is very useful and applicable in different domains. However, XML does not by itself introduce any built-in mechanisms for governing semantics. For this reason, many studies have been conducted on the representation of semantics within/from XML documents. This paper studies and discusses different aspects of the mentioned topic in the form of an overview with an emphasis on the state of semantics in XML and its presentation methods.


Author(s):  
Adélia Gouveia ◽  
Jorge Cardoso

The World Wide Web (WWW) emerged in 1989, developed by Tim Berners-Lee who proposed to build a system for sharing information among physicists of the CERN (Conseil Européen pour la Recherche Nucléaire), the world’s largest particle physics laboratory. Currently, the WWW is primarily composed of documents written in HTML (hyper text markup language), a language that is useful for visual presentation (Cardoso & Sheth, 2005). HTML is a set of “markup” symbols contained in a Web page intended for display on a Web browser. Most of the information on the Web is designed only for human consumption. Humans can read Web pages and understand them, but their inherent meaning is not shown in a way that allows their interpretation by computers (Cardoso & Sheth, 2006). Since the visual Web does not allow computers to understand the meaning of Web pages (Cardoso, 2007), the W3C (World Wide Web Consortium) started to work on a concept of the Semantic Web with the objective of developing approaches and solutions for data integration and interoperability purpose. The goal was to develop ways to allow computers to understand Web information. The aim of this chapter is to present the Web ontology language (OWL) which can be used to develop Semantic Web applications that understand information and data on the Web. This language was proposed by the W3C and was designed for publishing, sharing data and automating data understood by computers using ontologies. To fully comprehend OWL we need first to study its origin and the basic blocks of the language. Therefore, we will start by briefly introducing XML (extensible markup language), RDF (resource description framework), and RDF Schema (RDFS). These concepts are important since OWL is written in XML and is an extension of RDF and RDFS.


Author(s):  
Domenic Denicola

XML's steady descent into obscurity has become more and more apparent over the last few years. Developers, tool vendors, and browser implementers have all embraced HTML as the web's markup language, built on a substrate of JavaScript. Nothing epitomizes this shift more than the recent rise of web components: instead of standards committees dreaming up domain-specific XML vocabularies and hoping one day browsers would incorporate them, web components and the extensible web principles they embody allow authors to empower HTML with the same abilities XML once promised. The HTML of today is a truly extensible markup language. Where XML failed in this mission, both historically and practically, the web ecosystem routed around the damage of XML's influence by making HTML better suited for extensibility than ever before.


2019 ◽  
Vol 18 (04) ◽  
pp. 1950048
Author(s):  
Amjad Qtaish ◽  
Mohammad T. Alshammari

Extensible Markup Language (XML) has become a common language for data interchange and data representation in the Web. The evolution of the big data environment and the large volume of data which is being represented by XML on the Web increase the challenges in effectively managing such data in terms of storing and querying. Numerous solutions have been introduced to store and query XML data, including the file systems, Object-Oriented Database (OODB), Native XML Database (NXD), and Relational Database (RDB). Previous research attempts indicate that RDB is the most powerful technology for managing XML data to date. Because of the structure variations of XML and RDB, the need to map XML data to an RDB scheme is increased. This growth has prompted numerous researchers and database vendors to propose different approaches to map XML documents to an RDB, translating different types of XPath queries to SQL queries and returning the results to an XML format. This paper aims to comprehensively review most cited and latest mapping approaches and database vendors that use RDB solution to store and query XML documents, in a narrative manner. The advantages and the drawbacks of each approach is discussed, particularly in terms of storing and querying. The paper also provides some insight into managing XML documents using RDB solution in terms of storing and querying and contributes to the XML community.


2012 ◽  
Vol 10 (3) ◽  
pp. 13-26
Author(s):  
Xiaomin Zhu ◽  
Zhongxiang He ◽  
Shengbo Shi

Extensible Markup Language (XML) is a textual markup language which becomes more and more important in the Internet web service. However, some distinct disadvantages exist in XML, such as its nature of redundancy, which consumes the limited network’s bandwidth greatly especially in mobile computing. Considering the characteristics of the mobile commerce, the handsets’ memory capability and data processing time are two problems for XML being applied. This paper studies an enhancement of XML for the purpose of application in mobile e-commerce, called SXML, which means Simple XML to enhance the XML used in mobile web service. It helps XML producers minimizing the size effects of XML, e.g., the size overhead and slow implementation speed. Comprehensive simulations show that the SXML could reduce the size of XML documents and reduce the time of implementation, consequently utilize the bandwidth effectively.


Author(s):  
Richard Crowder ◽  
Yee-Wie Sim

Organisations are increasingly information intensive; hence providing access to data that is trapped in various proprietary forms including catalogues, databases, human resource systems and internally generated documents is now becoming a significant and challenging task. The authors have undertaken research into approaches to capture relevant knowledge from legacy documents. This is achieved by converting the legacy documents to XML, (eXtensible Markup Language), documents where the output is semantically tagged. Once in an XML form, the data can be easily transformed. This paper describes the development of tools to automate the process of converting legacy documents to XML documents. The purpose of this work is improve the efficiency and reliability of Expertise Finder suitable for use within an engineering design environment. We will also show that by querying the resultant XML versions of legacy documents provides better results than a basic text search over the identical documents when applied used within an Expertise Finder.


Author(s):  
Albrecht Schmidt ◽  
Stefan Manegold ◽  
Martin Kersten

Ever since the Extensible Markup Language (XML) (W3C, 1998b) began to be used to exchange data between diverse sources, interest has grown in deploying data management technology to store and query XML documents. A number of approaches propose to adapt relational database technology to store and maintain XML documents (Deutsch, Fernandez & Suciu, 1999; Florescu & Kossmann, 1999; Klettke & Meyer, 2000; Shanmugasundaram et al., 1999; Tatarinov et al., 2002; O’Neil et al., 2004). The advantage is that the XML repository inherits all the power of mature relational technology like indexes and transaction management. For XML-enabled querying, a declarative query language (Chamberlin et al., 2001) is available.


Author(s):  
Badya Al-Hamadani ◽  
Joan Lu

The eXtensible Markup Language (XML) is a World Wide Web Consortium (W3C) recommendation which has widely been used in both commerce and research. As the importance of XML documents increase, the need to deal with these documents increases as well. This chapter illustrates the methodology that has been used throughout the research, discussing all its parts and how these parts were adopted in the research.


Sign in / Sign up

Export Citation Format

Share Document