Web Mining
Latest Publications


TOTAL DOCUMENTS

19
(FIVE YEARS 0)

H-INDEX

4
(FIVE YEARS 0)

Published By IGI Global

9781591404149, 9781591404163

Web Mining ◽  
2011 ◽  
pp. 355-372
Author(s):  
Juan M. Hernansaez

In this chapter we focus on the three approaches that seem to be the most successful ones in the Web usage mining area: clustering, association rules and sequential patterns. We will discuss some techniques from each one of these approaches, and then we will show the benefits of using METALA (a META-Learning Architecture) as an integrating tool not only for the discussed Web usage mining techniques, but also for inductive learning algorithms. As we will show, this architecture can also be used to generate new theories and models that can be useful to provide new generic applications for several supervised and non-supervised learning paradigms. As a particular example of a Web usage mining application, we will report our work for a medium-sized commercial company, and we will discuss some interesting properties and conclusions that we have obtained from our reporting.


Web Mining ◽  
2011 ◽  
pp. 322-338 ◽  
Author(s):  
Zhixiang Chen ◽  
Richard H. Fowler ◽  
Ada Wai-Chee Fu ◽  
Chunyue Wang

A maximal forward reference of a Web user is a longest consecutive sequence of Web pages visited by the user in a session without revisiting some previously visited page in the sequence. Efficient mining of frequent traversal path patterns, that is, large reference sequences of maximal forward references, from very large Web logs is a fundamental problem in Web mining. This chapter aims at designing algorithms for this problem with the best possible efficiency. First, two optimal linear time algorithms are designed for finding maximal forward references from Web logs. Second, two algorithms for mining frequent traversal path patterns are devised with the help of a fast construction of shallow generalized suffix trees over a very large alphabet. These two algorithms have respectively provable linear and sublinear time complexity, and their performances are analyzed in comparison with the a priori-like algorithms and the Ukkonen algorithm. It is shown that these two new algorithms are substantially more efficient than the a priori-like algorithms and the Ukkonen algorithm.


Web Mining ◽  
2011 ◽  
pp. 307-321 ◽  
Author(s):  
Ricardo Baeza-Yates

Search engine logs not only keep navigation information, but also the queries made by their users. In particular, queries to a search engine follow a power-law distribution, which is far from uniform. Queries and related clicks can be used to improve the search engine itself in different aspects: user interface, index performance, and answer ranking. In this chapter we present some of the main ideas proposed in query mining and we show a few examples based on real data from a search engine focused on the Chilean Web.


Web Mining ◽  
2011 ◽  
pp. 228-252 ◽  
Author(s):  
Mohamed Salah Hamdi

Rapidly evolving network and computer technology, coupled with the exponential growth of the services and information available on the Internet, has already brought us to the point where hundreds of millions of people should have fast, pervasive access to a phenomenal amount of information, through desktop machines at work, school and home, through televisions, phones, pagers, and car dashboards, from anywhere and everywhere. The challenge of complex environments is therefore obvious: software is expected to do more in more situations, there are a variety of users (Power/Naive, Techie/ Financial/Clerical, ...), there are a variety of systems (Windows/NT/Mac/Unix, Client/Server, Portable, Distributed Object Manager, Web, ...), there are a variety of interactions (Real-time, Data Bases, Other Players, ...), and there are a variety of resources and goals (time, space, bandwidth, cost, security, quality, ...). To cope with such environments, the promise of information customization systems is becoming highly attractive. In this chapter we discuss important problems in relationship to such systems and smooth the way for possible solutions. The main idea is to approach information customization using a multi-agent paradigm.


Web Mining ◽  
2011 ◽  
pp. 69-98 ◽  
Author(s):  
Roberto Navigli

Domain ontologies are widely recognized as a key element for the so-called semantic Web, an improved, “semantic aware” version of the World Wide Web. Ontologies define concepts and interrelationships in order to provide a shared vision of a given application domain. Despite the significant amount of work in the field, ontologies are still scarcely used in Web-based applications. One of the main problems is the difficulty in identifying and defining relevant concepts within the domain. In this chapter, we provide an approach to the problem, defining a method and a tool, OntoLearn, aimed at the extraction of knowledge from Websites, and more generally from documents shared among the members of virtual organizations, to support the construction of a domain ontology. Exploiting the idea that a corpus of documents produced by a community is the most representative (although implicit) repository of concepts, the method extracts a terminology, provides a semantic interpretation of relevant terms and populates the domain ontology in an automatic manner. Finally, further manual corrections are required from domain experts in order to achieve a rich and usable knowledge resource.


Web Mining ◽  
2011 ◽  
pp. 373-392 ◽  
Author(s):  
Yew-Kwong Woon ◽  
Wee-Keong Ng ◽  
Ee-Peng Lim

The rising popularity of electronic commerce makes data mining an indispensable technology for several applications, especially online business competitiveness. The World Wide Web provides abundant raw data in the form of Web access logs. However, without data mining techniques, it is difficult to make any sense out of such massive data. In this chapter, we focus on the mining of Web access logs, commonly known as Web usage mining. We analyze algorithms for preprocessing and extracting knowledge from such logs. We will also propose our own techniques to mine the logs in a more holistic manner. Experiments conducted on real Web server logs verify the practicality as well as the efficiency of the proposed techniques as compared to an existing technique. Finally, challenges in Web usage mining are discussed.


Web Mining ◽  
2011 ◽  
pp. 119-144
Author(s):  
Neil C. Rowe

We survey research on using captions in data mining from the Web. Captions are text that describes some other information (typically, multimedia). Since text is considerably easier to analyze than non-text, a good way to support access to non-text is to index the words of its captions. However, captions vary considerably in form and content on the Web. We discuss the range of syntactic clues (such as HTML tags) and semantic clues (such as particular words). We discuss how to quantify clue strength and combine clues for a consensus. We then discuss the problem of mapping information in captions to information in media objects. While it is hard, classes of mapping schemes are distinguishable, and a segmentation of the media can be matched to a parse of the caption.


Web Mining ◽  
2011 ◽  
pp. 1-26 ◽  
Author(s):  
Gilbert W. Laware

This chapter introduces the need for the World Wide Web to provide a standard mechanism so individuals can readily obtain data, reports, research and knowledge about any topic posted to it. Individuals have been frustrated by this process since they are not able to access relevant data and current information. Much of the reason for this lies with metadata, the data about the data that are used in support of Web content. These metadata are non-existent, ill-defined, erroneously labeled, or, if well-defined, continue to be marked by other disparate metadata. With the ever-increasing demand for Web-enabled data mining, warehousing and management of knowledge, an organization has to address the multiple facets of process, standards, technology, data mining, and warehousing management. This requires approaches to provide an integrated interchange of quality metadata that enables individuals to access Web content with the most relevant, contemporary data, information, and knowledge that are both content-rich and practical for decision-making situations.


Web Mining ◽  
2011 ◽  
pp. 339-354 ◽  
Author(s):  
Bernard J. Jansen ◽  
Amanda Spink

This chapter reviews the concepts of Web results page and Web page viewing patterns by users of Web search engines. It presents the advantages of using traditional transaction log analysis in identifying these patterns, serving as a basis for Web usage mining. The authors also present the results of a temporal analysis of Web page viewing, illustrating that the user — information interaction is extremely short. By using real data collected from real users interacting with real Web information retrieval systems, the authors aim to highlight one aspect of the complex environment of Web information seeking.


Web Mining ◽  
2011 ◽  
pp. 208-227 ◽  
Author(s):  
Mike Thelwall

A range of techniques is described for cleansing and validating link data for use in different types of Web structure mining, and some applications are given. The main application area is Multiple Site Link Structure Analysis, which typically involves mining patterns from themed collections of Websites. The importance of data cleansing and validation stems from the fact that Web data are typically very messy. It involves extensive duplication of pages and page components, which when analyzing raw Web data may give meaningless results.


Sign in / Sign up

Export Citation Format

Share Document